EMPIRICAL ANALYSIS OF FAULT PREDICATION TECHNIQUES FOR IMPROVING SOFTWARE PROCESS CONTROL

Similar documents
A SOFTWARE MODEL FOR THE MULTILAYER PERCEPTRON

Empirical Validate C&K Suite for Predict Fault-Proneness of Object-Oriented Classes Developed Using Fuzzy Logic.

ANN WHICH COVERS MLP AND RBF

Designing a learning system

Improving Template Based Spike Detection

Designing a learning system

New Fuzzy Color Clustering Algorithm Based on hsl Similarity

Euclidean Distance Based Feature Selection for Fault Detection Prediction Model in Semiconductor Manufacturing Process

Ones Assignment Method for Solving Traveling Salesman Problem

Probabilistic Fuzzy Time Series Method Based on Artificial Neural Network

Criterion in selecting the clustering algorithm in Radial Basis Functional Link Nets

Pruning and Summarizing the Discovered Time Series Association Rules from Mechanical Sensor Data Qing YANG1,a,*, Shao-Yu WANG1,b, Ting-Ting ZHANG2,c

3D Model Retrieval Method Based on Sample Prediction

Pattern Recognition Systems Lab 1 Least Mean Squares

Neural Networks A Model of Boolean Functions

A New Morphological 3D Shape Decomposition: Grayscale Interframe Interpolation Method

Optimization for framework design of new product introduction management system Ma Ying, Wu Hongcui

New HSL Distance Based Colour Clustering Algorithm

Administrative UNSUPERVISED LEARNING. Unsupervised learning. Supervised learning 11/25/13. Final project. No office hours today

Task scenarios Outline. Scenarios in Knowledge Extraction. Proposed Framework for Scenario to Design Diagram Transformation

Image Segmentation EEE 508

A new algorithm to build feed forward neural networks.

Performance Plus Software Parameter Definitions

Octahedral Graph Scaling

Analysis of Server Resource Consumption of Meteorological Satellite Application System Based on Contour Curve

CS 683: Advanced Design and Analysis of Algorithms

Intrusion Detection using Fuzzy Clustering and Artificial Neural Network

Mobile terminal 3D image reconstruction program development based on Android Lin Qinhua

Software Fault Prediction of Unlabeled Program Modules

Neuro Fuzzy Model for Human Face Expression Recognition

Keywords Software Architecture, Object-oriented metrics, Reliability, Reusability, Coupling evaluator, Cohesion, efficiency

Evaluation of Support Vector Machine Kernels for Detecting Network Anomalies

The Closest Line to a Data Set in the Plane. David Gurney Southeastern Louisiana University Hammond, Louisiana

Lecture 21: Variation Risk Management

arxiv: v2 [cs.ds] 24 Mar 2018

A Modified Multiband U Shaped and Microcontroller Shaped Fractal Antenna

Analysis of Documents Clustering Using Sampled Agglomerative Technique

Accuracy Improvement in Camera Calibration

CSCI 5090/7090- Machine Learning. Spring Mehdi Allahyari Georgia Southern University

Cluster Analysis. Andrew Kusiak Intelligent Systems Laboratory

VALIDATING DIRECTIONAL EDGE-BASED IMAGE FEATURE REPRESENTATIONS IN FACE RECOGNITION BY SPATIAL CORRELATION-BASED CLUSTERING

BASED ON ITERATIVE ERROR-CORRECTION

Cubic Polynomial Curves with a Shape Parameter

THIN LAYER ORIENTED MAGNETOSTATIC CALCULATION MODULE FOR ELMER FEM, BASED ON THE METHOD OF THE MOMENTS. Roman Szewczyk

Bayesian approach to reliability modelling for a probability of failure on demand parameter

Improving Information Retrieval System Security via an Optimal Maximal Coding Scheme

DATA MINING II - 1DL460

( n+1 2 ) , position=(7+1)/2 =4,(median is observation #4) Median=10lb

Investigating methods for improving Bagged k-nn classifiers

HADOOP: A NEW APPROACH FOR DOCUMENT CLUSTERING

Assignment Problems with fuzzy costs using Ones Assignment Method

Performance Comparisons of PSO based Clustering

An Improved Shuffled Frog-Leaping Algorithm for Knapsack Problem

A Model for Estimation of Efforts in Development of Software Systems

Solving Fuzzy Assignment Problem Using Fourier Elimination Method

Harris Corner Detection Algorithm at Sub-pixel Level and Its Application Yuanfeng Han a, Peijiang Chen b * and Tian Meng c

New Results on Energy of Graphs of Small Order

Intermediate Statistics

A Comparative Study of Positive and Negative Factorials

Sectio 4, a prototype project of settig field weight with AHP method is developed ad the experimetal results are aalyzed. Fially, we coclude our work

Chapter 1. Introduction to Computers and C++ Programming. Copyright 2015 Pearson Education, Ltd.. All rights reserved.

Outline. Research Definition. Motivation. Foundation of Reverse Engineering. Dynamic Analysis and Design Pattern Detection in Java Programs

Service Oriented Enterprise Architecture and Service Oriented Enterprise

Fuzzy Linear Regression Analysis

Using a Dynamic Interval Type-2 Fuzzy Interpolation Method to Improve Modeless Robots Calibrations

SOFTWARE usually does not work alone. It must have

Elementary Educational Computer

9.1. Sequences and Series. Sequences. What you should learn. Why you should learn it. Definition of Sequence

Journal of Chemical and Pharmaceutical Research, 2013, 5(12): Research Article

Our second algorithm. Comp 135 Machine Learning Computer Science Tufts University. Decision Trees. Decision Trees. Decision Trees.

Evaluation scheme for Tracking in AMI

. Written in factored form it is easy to see that the roots are 2, 2, i,

x x 2 x Iput layer = quatity of classificatio mode X T = traspositio matrix The core of such coditioal probability estimatig method is calculatig the

Structuring Redundancy for Fault Tolerance. CSE 598D: Fault Tolerant Software

Data Analysis. Concepts and Techniques. Chapter 2. Chapter 2: Getting to Know Your Data. Data Objects and Attribute Types

Descriptive Statistics Summary Lists

Protected points in ordered trees

Diego Nehab. n A Transformation For Extracting New Descriptors of Shape. n Locus of points equidistant from contour

Data-Driven Nonlinear Hebbian Learning Method for Fuzzy Cognitive Maps

Model Based Design: develpment of Electronic Systems

South Slave Divisional Education Council. Math 10C

Algorithms for Disk Covering Problems with the Most Points

Learning Sensor Data Characteristics in Unknown Environments.

An Anomaly Detection Method Based On Deep Learning

CLUSTERING TECHNIQUES TO ANALYSES IN DENSITY BASED SOCIAL NETWORKS

IMP: Superposer Integrated Morphometrics Package Superposition Tool

Using Support Vector Machines for Direct Marketing Models

Study on effective detection method for specific data of large database LI Jin-feng

A Novel Hybrid Algorithm for Software Cost Estimation Based on Cuckoo Optimization and K-Nearest Neighbors Algorithms

Image based Cats and Possums Identification for Intelligent Trapping Systems

Appendix A. Use of Operators in ARPS

Ontology-based Decision Support System with Analytic Hierarchy Process for Tour Package Selection

Compactness of Fuzzy Sets

Parallel Learning of Large Fuzzy Cognitive Maps

Analysis of Class Design Coupling Based on Information Entropy Di Jiang 1,2, a, Hua Zhou 1,2,b and Xingping Sun 1,2,c

Mining from Quantitative Data with Linguistic Minimum Supports and Confidences

Big-O Analysis. Asymptotics

Text Summarization using Neural Network Theory

Arithmetic Sequences

Theory of Fuzzy Soft Matrix and its Multi Criteria in Decision Making Based on Three Basic t-norm Operators

Transcription:

Iteratioal Joural of Iformatio Techology ad Kowledge Maagemet July-December 2012, Volume 5, No. 2, pp. 371-375 EMPIRICAL ANALYSIS OF FAULT PREDICATION TECHNIQUES FOR IMPROVING SOFTWARE PROCESS CONTROL Kamaljit Kaur* ABSTRACT: I this paper, we preset the applicatio of the eural etwork for the idetificatio of Reusable Software modules i Orieted Software System. Metrics are used for the structural aalysis of the differet procedures. The values of Metrics will become the iput dataset for the eural etwork systems ad Fuzzy Systems. Traiig Algorithm based o Neural Network ad fuzzy clusterig are experimeted ad the results are recorded i terms of Accuracy, Mea Absolute Error (MAE) ad Root Mea Square Error (RMSE). The results show that Fuzzy Clusterig based techique have show better results tha Resiliet Backpropagatio o the basis of testig data i terms of predictio techique.hece the proposed model ca be used to improve the productivity ad quality of software developmet. Keywords: Software reusability, Neural Networks, MAE, RMSE, Accuracy. 1. INTRODUCTION A software bug is a error, flaw, mistake, failure, or fault i a program that prevets it from behavig as iteded(e.g., producig a icorrect result). Most bugs arise from mistakes ad errors made by people i either a program's source code or its desig, ad a few are caused by compilers producig icorrect code [1]. The cost of?dig ad?xig faults i software typically rises as the developmet project progresses ito a ew phase. Faults that are foud after the system has bee delivered to the customer are may times more expesive to track dow ad correct tha if foud durig a earlier phase[2]. Metrics is defied as The cotiuous applicatio of measuremet based techiques to the software developmet process ad its products to supply meaigful ad timely maagemet iformatio, together with the use of those techiques to improve that process ad its products [3]. Software metrics is all about measuremet ad these are applicable to all the phases of software developmet life cycle from iitiatio to maiteace. The mai aim of this work is to model the impact of faults i fuctio based software modules. The mai objectives are described as follows: To fid the structural code ad desig attributes of software systems Fid the best algorithms that ca be used to model impact of faults i object orieted i.e. the predict the level of impact of the faults i the software system. This paper is orgaized as follows: Sectio two describes the Methodology part of work doe, which shows * RIMT, Madi Gobidgarh, Pujab, Er.kamaljitkaur@yahoo.com the steps used i order to reach the objectives ad carry out the results. I the sectio three, results of the implemetatio are discussed. I the last sectio, o the basis of the discussio various Coclusios are draw ad the future scope for the preset work is discussed. 2. PROPOSED METHODOLOGY The methodology cosists of the followig steps: 2.1 Fid the Structural Code ad Desig Attributes The first step is to fid the structural code ad desig attributes of software systems i.e. software metrics. The realtime defect data sets are take from the NASA's MDP (Metric Data Program) data repository. The dataset is related to the safety critical software systems beig developed by NASA. 2.2 Collectio of Metric Values The suitable metrics like product module metrics out of these data sets are cosidered. The term product is used referrig to module level data. The term metrics data applies to ay fiite umeric values, which describe measured qualities ad characteristics of a product. The term product refers to aythig to which defect data ad metrics data ca be associated. I most cases products will be syoymous with code related items such a fuctios ad systems/sub-systems. 2.3 Aalyze ad Refie Metrics the Metric Values I the ext step the metrics are aalyzed, refied ad ormalized ad the used for modelig of fault tolerace i software systems. I the step, table of module levels metrics

372 INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND KNOWLEDGE MANAGEMENT PC5_product_module_metrics is joied with PC5_defect_product_relatios ad thereafter agai the joi operatio of the resultat table is performed with PC5_static_defect_data. A Etity-Relatioship diagram relates Modules to Defects ad Defects to Severity of Defects is show i figure 1. Figure 1: A Etity-Relatioship Diagram Relates Modules to Defects ad Defects to Severity of Defects I the figure 1 the MODULE_ID is the uique umeric idetifier of the module ad DEFECT_ID is the uique umeric idetifier of the associated defect. The SEVERITY field i the PC5_static_defect_data table shows the value that quatifies the impact of the defect o the overall eviromet i the rage of 1 to 5. Where, 1 meas most severe ad 5 beig least severe. For example, severity 1 may imply that the defect caused a loss of fuctioality without a workaroud where severity 5 may mea that the impact is superficial ad did ot cause ay major disruptios to the system. 2.4 Explore Techiques Based o Neural Network ad Fuzzy 2.4.1 Resiliet Back-propagatio Algorithm It is a supervised learig method, ad is a geeralizatio of the delta rule. It requires a dataset of the desired output for may iputs, makig up the traiig set. It is most useful for feed-forward etworks (etworks that have o feedback, or simply, that have o coectios that loop). The term is a abbreviatio for backward propagatio of errors. Backpropagatio requires that the activatio fuctio used by the artificial euros (or odes ) be differetiable. The algorithm acts o each weight separately. For each weight, if there was a sig chage of the partial derivative of the total error fuctio compared to the last iteratio, the update value for that weight is multiplied by a factor, where 0 < < 1. If the last iteratio produces the same sig, the update value is multiplied by a factor of +, where + > 1. The update values are calculated for each weight i the above maer, ad fially each weight is chaged by its ow update value, i the opposite directio of that weight s partial derivative. This is to miimize the total error fuctio. + is empirically set to 1.2 ad to 0.5. Phase 1: Propagatio Each propagatio ivolves the followig steps: 1. Forward propagatio of a traiig patter s iput through the eural etwork i order to geerate the propagatio s output activatios. 2. Backward propagatio of the propagatio s output activatios through the eural etwork usig the traiig patter s target i order to geerate the deltas of all output ad hidde euros. Phase 2: Weight update For each weight-syapse follow the followig steps: 1. Multiply its output delta ad iput activatio to get the gradiet of the weight. 2. Brig the weight i the opposite directio of the gradiet by subtractig a ratio of it from the weight. This ratio iflueces the speed ad quality of learig; it is called the learig rate. The sig of the gradiet of a weight idicates where the error is icreasig, this is why the weight must be updated i the opposite directio. Repeat phase 1 ad 2 util the performace of the etwork is satisfactory. 2.4.2 Fuzzy Clusterig Clusterig ca be a very effective techique to idetify atural groupigs i data from a large data set, thereby allowig cocise represetatio of relatioships embedded

EMPIRICAL ANALYSIS OF FAULT PREDICATION TECHNIQUES FOR IMPROVING SOFTWARE PROCESS CONTROL 373 i the data. I our study, clusterig allows us to group software modules ito faulty ad o-faulty categories hece allowig for easier uderstadability. Mai Steps of fuzzy clusterig algorithm[6] : Step 1: Calculate the iput data to be clusters. X ij, i = 1, 2,..., ; j = 1, 2,..., m is the umber of data m is the type of data Step 2: Set the variables value: i- r j, j = 1, 2,..., m ii- quash factor iii- * accept ratio iv- reject ratio v- X jmi vi- X jmax Step 3: Set the ormal data value based o Xj-mi da Xj-max use with the followig model: Xij X orm j mi Xij, i 1,2,..., ; j 1,2,..., m X X j mi j mi Step 4: Set the potetial of each data poit by the formula: a = 1, revise to a = if m = 1, set i 1 xt xk 4 ra P ' e, i 1,2,..., ; k 1,2,..., ; i k i Step 5: Set the highest potetial value of data: M max Pi i 1,2,..., h i, so that D i M Step 6: Set cluster cetre ad update the potetial value that correspod to aother data: i- Ct = [] ii- V j = X hj, j = 1, 2,...m iii- C = 0 (umber of clusters) iv- Cd = 1 v- z = m vi- Do Cd 0 ad Z 0 Step 7: Put the real data: Ct Ct * ( X X ) X ij ij j max j mi j mi Step 8: Set the cluster sigma: j r ( X X ) * j j max j mi 2.5 Compariso of the Algorithms The comparisos are made o the basis of the more accuracy ad least value of MAE ad RMSE error values. Accuracy value of the predictio model is the major criteria used for compariso. The mea absolute error is chose as the stadard error. The techique havig lower value of mea absolute error is chose as the best fault predictio techique. Mea absolute error Mea absolute error, MAE is the average of the differece betwee predicted ad actual value i all test cases; it is the average predictio error [4]. The formula for calculatig MAE is give i equatio show below: a1 c1 a2 c2... a c Assumig that the actual output is a, expected output is c. Root mea-squared error RMSE is frequetly used measure of differeces betwee values predicted by a model or estimator ad the values actually observed from the thig beig modeled or estimated [7]. It is just the square root of the mea square error as show i equatio give below: ( a c ) ( a c )... ( a c ) 2 2 2 1 1 2 2 The mea-squared error is oe of the most commoly used measures of success for umeric predictio. This value is computed by takig the average of the squared differeces betwee each computed value ad its correspodig correct value. The root mea-squared error is simply the square root of the mea-squared-error. The root mea-squared error gives the error value the same dimesioality as the actual ad predicted values. The mea absolute error ad root mea squared error is calculated for each machie learig algorithm i.e. Neural Network. 3. EXPERIMENTAL RESULTS AND DISCUSSIONS I this sectio, we discuss the experimetal results obtaied usig Resiliet Backpropagatio traiig algorithm, Such algorithm is explored ot oly for accuracy poit of view but also for comparig it with fuzzy clusterig. Experimets are coducted o NASA s public domai defect dataset. 8

374 INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY AND KNOWLEDGE MANAGEMENT 3.1 Data Set As metioed above, experimets are coducted o NASA s public domai defect dataset. The real time defect data set used is take from the NASA's MDP (Metric Data Program) data repository, the details of that dataset cotais 293 Object Orieted modules with differet values of impact of faults labeled as 1, 2, 3, 4 ad 5. The severity level 5 are ot preset i the dataset. So, the level 1 represets the highest severity, level 2 represets of less severity as compared to level 1 ad level 3 represets the medium fault ad level 4 represet the mior fault that ca be overlooked to save time. Details of the Type of Modules i the Dataset are show i Table 1 i tabular form ad Figure 2 i graphical form. cases. The root mea-squared error i.e. RMSE is simply the square root of the mea-squared-error. The root measquared error gives the error value as the same dimesioality as the actual ad predicted values. Table 1 Details of the Type of Modules i the Dataset Severity of Impact Level Cout 1 48 2 207 3 28 4 10 Figure 3: The Traiig Phase Performace of the Resiliet Backpropagatio I the preset work the five Neural Network based algorithms experimeted i Matlab 7.7 ad after the traiig each traied etwork is tested with testig dataset of 15 values derived from the PC4 dataset. The overall testig performace of the differet algorithms is show i table 2. The results reveal that the Resiliet Backpropagatio algorithm have outperformed all other algorithm uder study with 0.3980, 0.5385 ad 80% as MAE, RMSE ad Accuracy values respectively. Table 2 Performace Results of Differet Resiliet Backpropagatio ad Fuzzy Clusterig Algorithms Figure 2: Graphical Represetatio of Details of the Type of Modules i the Dataset X axis belogs to Level of Severity of Faults Y axis belogs to Number of Modules The algorithms are evaluated o the basis of the followig criteria: The developed software computes the mea absolute error, root mea squared error, relative absolute error ad root relative squared error. However, the most commoly reported error is the mea absolute error ad root mea squared error. The root mea squared error is more sesitive to outliers i the data tha the mea absolute error. I order to miimize the effect of outliers, mea absolute error is chose as the stadard error. The predictio techique havig least value of mea absolute error is chose as the best predictio techique. Mea absolute error, MAE is the average of the differece betwee predicted ad actual value i all test Sr. No. Algorithm MAE RMSE Accuracy% 1 Resiliet 0.3408 0.5385 82.2526 Backpropagatio 2 Fuzzy clusterig 0.2982 0.4200 94.2262 REFERENCE [1] http://puretest.blogspot.i/2009/11/1.html [2] Barry Boehm ad Victor R. Basili. Software Defect Reductio Top 10 List. Computer, 34(1), 135-147, Jauary 2001. [3] Bibi S., Tsoumakas G., Stamelos I., Vlahavas I., Software Defect Predictio Usig Regressio via Classificatio, IEEE Iteratioal Coferece o Computer Systems ad Applicatios, Issue Date: March 8, 2006, pp. 330-336, ( h t t p : / / i e e e x p l o r e. i e e e. o r g / x p l / f r e e a b s _ a l l. jsp?arumber=1618375) [4] Challagulla V.U.B., Bastai, F.B., I-Lig Ye, Paul, (2005). Empirical Assessmet of Machie Learig Based Software Defect Predictio Techiques, 10th IEEE Iteratioal Workshop o Object- Orieted Real-Time

EMPIRICAL ANALYSIS OF FAULT PREDICATION TECHNIQUES FOR IMPROVING SOFTWARE PROCESS CONTROL 375 Depedable Systems, WORDS 2005, 2-4 Feb 2005, pp. 263-270. [5] Mahaweerawat, A. (2004). Fault-Predictio i Object Orieted Software s Usig Neural Network Techiques, Advaced Virtual ad Itelliget Computig Ceter (AVIC), Departmet of Mathematics, Faculty of Sciece, Chulalogkor Uiversity, Bagkok. [6] Agus Priyoo, Muhammad Ridwa, Ahmad Jais Alias, Riza Atiq, O.K. Rahmat, Azmi Hassa ad Mohd. Alauddi Mohd. Ali, Geeratio of Fuzzy Rules with Subtractive Clusterig, Jural Tekologi, 43(D) Dis. 2005, pp. 143-153, Uiversiti Tekologi Malaysia. [7] Hudepohl J.P., S.J. Aud, T.M. Khoshgoftaar, E.B. Alle, ad J.E. Mayrad, (1996). Software Metrics ad Models o the Desktop, IEEE Software, 13(5), 56-60. [8] Satwider Sigh, K.S. Kahlo, Effectiveess of Ecapsulatio ad Object-orieted metrics to Refactor Code ad Idetify Error Proe Classes Usig Bad Smells, ACM SIGSOFT Software Egieerig Notes, 36(5), September 2011. [9] Pueet Mittal, Satwider Sigh, ad K.S. Kahlo, Idetificatio of Error Proe Classes for Fault Predictio Usig Object Orieted Metrics, ACC 2, Vol. 191 Spriger (2011), pp. 58-68.