Support Vector Regression for Software Reliability Growth Modeling and Prediction

Similar documents
Lecture 9: Support Vector Machines

Support Vector Machines

SUPPORT VECTOR MACHINES

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Support Vector Machines + Classification for IR

The Effects of Outliers on Support Vector Machines

SUPPORT VECTOR MACHINES

Chap.12 Kernel methods [Book, Chap.7]

Kernel Combination Versus Classifier Combination

Table of Contents. Recognition of Facial Gestures... 1 Attila Fazekas

Lab 2: Support vector machines

Classification by Support Vector Machines

Support Vector Machines

Reliability Allocation

Keywords: Software reliability, Logistic Growth, Curve Model, Software Reliability Model, Mean Value Function, Failure Intensity Function.

Using Decision Boundary to Analyze Classifiers

Classification by Support Vector Machines

Data Distortion for Privacy Protection in a Terrorist Analysis System

Feature scaling in support vector data description

Bagging and Boosting Algorithms for Support Vector Machine Classifiers

Covert Channel Detection in the ICMP Payload Using Support Vector Machine

Optimal Separating Hyperplane and the Support Vector Machine. Volker Tresp Summer 2018

Chapter 1: Introduction

Rule extraction from support vector machines

A Boosting-Based Framework for Self-Similar and Non-linear Internet Traffic Prediction

A Practical Guide to Support Vector Classification

Topics in Machine Learning

Available online at ScienceDirect. Procedia Computer Science 57 (2015 )

Introduction to Support Vector Machines

M. Xie, G. Y. Hong and C. Wohlin, "A Study of Exponential Smoothing Technique in Software Reliability Growth Prediction", Quality and Reliability

Support Vector Machines

No more questions will be added

Introduction to Machine Learning

Large synthetic data sets to compare different data mining methods

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

All lecture slides will be available at CSC2515_Winter15.html

Parameter Selection of a Support Vector Machine, Based on a Chaotic Particle Swarm Optimization Algorithm

Support Vector Machines

CS570: Introduction to Data Mining

Effort-Index-Based Software Reliability Growth Models and Performance Assessment

732A54/TDDE31 Big Data Analytics

Improving Recognition through Object Sub-categorization

Machine Learning for NLP

Support Vector Machine Ensemble with Bagging

A New Fuzzy Membership Computation Method for Fuzzy Support Vector Machines

Image and Video Quality Assessment Using Neural Network and SVM

M. Xie, G. Y. Hong and C. Wohlin, "Modeling and Analysis of Software System Reliability", in Case Studies on Reliability and Maintenance, edited by

4.12 Generalization. In back-propagation learning, as many training examples as possible are typically used.

Study on Two Dimensional Three stage Software Reliability Growth Model

Efficient Case Based Feature Construction

J. Weston, A. Gammerman, M. Stitson, V. Vapnik, V. Vovk, C. Watkins. Technical Report. February 5, 1998

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Local Linear Approximation for Kernel Methods: The Railway Kernel

12 Classification using Support Vector Machines

A New Type of ART2 Architecture and Application to Color Image Segmentation

Minimal Test Cost Feature Selection with Positive Region Constraint

KBSVM: KMeans-based SVM for Business Intelligence

Discrete time modelling in software reliability engineering a unified approach

9. Support Vector Machines. The linearly separable case: hard-margin SVMs. The linearly separable case: hard-margin SVMs. Learning objectives

SVM in Analysis of Cross-Sectional Epidemiological Data Dmitriy Fradkin. April 4, 2005 Dmitriy Fradkin, Rutgers University Page 1

Combining SVMs with Various Feature Selection Strategies

Application of Support Vector Machine Algorithm in Spam Filtering

On the Role of Weibull-type Distributions in NHPP-based Software Reliability Modeling

Image Classification Using Wavelet Coefficients in Low-pass Bands

Learning texture similarity with perceptual pairwise distance

Support Vector Machines for Face Recognition

Chakra Chennubhotla and David Koes

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Optimization Methods for Machine Learning (OMML)

Classification with Class Overlapping: A Systematic Study

c Gabriella Angela Melki, September 2018 All Rights Reserved.

Support Vector Machines and their Applications

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

A Modular Reduction Method for k-nn Algorithm with Self-recombination Learning

Relevance Feedback for Content-Based Image Retrieval Using Support Vector Machines and Feature Selection

Information Management course

Kernel Methods & Support Vector Machines

M. Xie, G. Y. Hong and C. Wohlin, "A Practical Method for the Estimation of Software Reliability Growth in the Early Stage of Testing", Proceedings

10. Support Vector Machines

Training Data Selection for Support Vector Machines

Module 4. Non-linear machine learning econometrics: Support Vector Machine

Feature Data Optimization with LVQ Technique in Semantic Image Annotation

Support Vector Machines

Leave-One-Out Support Vector Machines

MONITORING SCREW FASTENING PROCESS BASED SUPPORT VECTOR MACHINE CLASSIFICATION

A Summary of Support Vector Machine

Software Reliability Models: Failure rate estimation

Data mining with Support Vector Machine

Traffic Signs Recognition using HP and HOG Descriptors Combined to MLP and SVM Classifiers

One-class Problems and Outlier Detection. 陶卿 中国科学院自动化研究所

Lab 2: Support Vector Machines

Software Quality Engineering: Testing, Quality Assurance, and Quantifiable Improvement

Support Vector Machines

Content Based Image Retrieval system with a combination of Rough Set and Support Vector Machine

Evaluation of Performance Measures for SVR Hyperparameter Selection

A Survey on Software Reliability Assessment by Using Different Machine Learning Techniques

Constrained Optimization of the Stress Function for Multidimensional Scaling

MRF Based LSB Steganalysis: A New Measure of Steganography Capacity

Support Vector Machines

Kernel-based online machine learning and support vector reduction

Transcription:

Support Vector Regression for Software Reliability Growth Modeling and Prediction 925 Fei Xing 1 and Ping Guo 2 1 Department of Computer Science Beijing Normal University, Beijing 100875, China xsoar@163.com 2 Laboratory of Computer Science Institute of Software, Chinese Academy of Sciences, Beijing 100080, China pguo@ieee.org Abstract. In this work, we propose to apply support vector regression (SVR) to build software reliability growth model (SRGM). SRGM is an important aspect in software reliability engineering. Software reliability is the probability that a given software will be functioning without failure during a specified period of time in a specified environment. In order to obtain the better performance of SRGM, practical selection of parameter C for SVR is discussed in the experiments. Experimental results with the classical Sys1 and Sys3 SRGM data set show that the performance of the proposed SVR-based SRGM is better than conventional SRGMs and relative good prediction and generalization ability are achieved. 1 Introduction Software reliability is one of key factors in software qualities. It is one of the most important problem facing the software industry. There exists an increasing demand of delivering reliable software products. Developing reliable software is a difficult problem because there are many factors impacting development such as resource limitations, unrealistic requirements, etc. It is also a hard problem to know whether or not the software being delivered is reliable. To solve the problem, many software reliability models have been proposed over past 30 years, they can provide quantitative measures of the reliability of software systems during software development processes [1 3]. Software reliability growth models (SRGMs) have been proven to be successful in estimating the software reliability and the number of errors remaining in the software [2]. Using SRGMs, people can assess the current reliability and predict the future reliability, and further more, conduct the software testing and debugging process. Now there have already existed many SRGMs such as Goel-Okumoto Model, Yamada Delayed S-Shaped Model, Yamada Weibull- Type Testing-Effort Function Model, etc. Most of SRGMs assume that the fault process follows the curve of specific type. Actually, this assumption may not be realistic in practice and these SRGMs are sometimes insufficient and inaccurate to analyze actual software failure data for reliability assessment. J. Wang, X. Liao, and Z. Yi (Eds.): ISNN 2005, LNCS 3496, pp. 925 930, 2005. c Springer-Verlag Berlin Heidelberg 2005

926 Fei Xing and Ping Guo In recent years, support vector machine (SVM) [4] is a new technique for solving pattern classification and universal approximation, it has been demonstrated to be very valuable for several real-world applications [5, 6]. SVM is known to generalize well in most cases and adapts at modeling nonlinear functional relationships which are difficult to model with other techniques. Consequently, we propose to apply support vector regression (SVR) to build SRGM and investigate the conditions which are typically encountered in software reliability engineering. We believe that all these characteristics are appropriate to SRGM. 2 Support Vector Regression SVM was introduced by Vapnik in the late 1960s on the foundation of statistical learning theory [7]. It has originally been used for classification purposes but its principle can be extended easily to the task of regression by introducing an alternative loss function. The basic idea of SVR is to map the input data x into a higher dimensional feature space F via a nonlinear mapping φ and then a linear regression problem is obtained and solved in this feature space. Given a training set of l examples {(x 1,y 1 ), (x 2,y 2 ),...,(x l,y l )} R n R, where x i is the input vector of dimension n and y i is the associated target. We want to estimate the following linear regression: f(x) =(w x)+b, w R n,b R, (1) Here we consider the special case of SVR problem with Vapnik s ɛ-insensitive loss function defined as: { 0 y f(x) ɛ L ɛ (y, f(x)) = (2) y f(x) ɛ y f(x) >ɛ The best line is defined to be that line which minimizes the following cost function: R(w) = 1 2 w 2 + C L ɛ (f(y i, x i )) (3) where C is a constant determining the trade-off between the training errors and the model complexity. By introducing the slack variables ξ i,ξi, we can get the equivalent problem of Eq. 3. If the observed point is above the tube, ξ i is the positive difference between the observed value and ɛ. Similar, if the observed point is below the tube, ξi is the negative difference between the observed value and ɛ. Written as a constrained optimization problem, it amounts to minimizing: 1 2 w 2 + C (ξ i + ξi ) (4) subject to: y i (w x i ) b ɛ + ξ i (w x i )+b y i ɛ + ξi (5) ξ i,ξi 0

Support Vector Regression for Software Reliability Growth Modeling 927 To generalize to non-linear regression, we replace the dot product with a kernel function K( ) which is defined as K(x i, x j )=φ(x i ) φ(x j ). By introducing Lagrange multipliers α i,α i which are associated with each training vector to cope with both upper and lower accuracy constraints, respectively, we can obtain the dual problem which maximizes the following function: (α i α i )y i ɛ subject to: (α i + α i ) 1 2 (α i α i )=0 0 α i,α i (α i α i )(α j α j )K(x i, x j ) (6) i,j=1 C i =1, 2,...,l (7) Finally, the estimate of the regression function at a given point x is then: f(x) = (α i α i )K(x i, x)+b (8) 3 Modeling the Software Reliability Growth In this section, we present real projects to which we apply SVR for software reliability growth generalization and prediction. The data sets are Sys1 and Sys3 software failure data applied for software reliability growth modeling in [2]. Sys1 data set contains 54 data pairs and Sys3 data set contains 278 data pairs. The data set are normalized to the range of [0,1] first. The normalized successive failure occurrence times is the input of SVR function and the normalized accumulated failure number is the output of SVR function. We denote the SVR-based software reliability growth model as SVRSRG. Here we list the math expression of three conventional SRGMs refered in the experiments. Goel-Okumoto Model: Yamada Delayed S-Shaped Model: m(t) =a(1 e rt ), a > 0, r>0 (9) m(t) =a(1 (1 + rt)e rt ) (10) Yamada Weibull-Type Testing-Effort Function Model: m(t) =a[1 e rα(1 e βtγ ) ] (11) The approach taken to perform the modeling and prediction includes following steps:

928 Fei Xing and Ping Guo 1. Modeling the reliability growth based on the raw failure data 2. Estimating the model parameters 3. Reliability prediction based on the established model Three groups of experiments have been performed. Training error and testing error have been used as evaluation criteria. In the tables presented in this paper, the training error and the testing error are measured by sum-of-square l (x i ˆx i ) 2,wherex i, ˆx i are, respectively, the data set measurements and their prediction. In default case, SVR used in the experiment is ν-svr and the parameters ν and C are optimized by cross-validation method. In the experiment of generalization, we partition the data into two parts: training set and test set. Two thirds of the samples are randomly drawn from the original data set as training set and remaining one third of the samples as the testing set. This kind of training is called generalization training [8]. Fig. 1. (a) and (b) show the experimental result for software reliability growth modeling trained by using data set Sys1 and Sys3, respectively. It is obvious that SVRSRG gives a better performance of fitting the original data than the other models. From Table 1 we can find both the training error and the testing error of SVRSRG are smaller than the other classical SRGMs. In the experiment of prediction, we will simulate the practical process of software reliability growth prediction. It is based on predicting future values by the way of time series prediction methods. Assuming software have been 1 0.9 (a) 1 0.9 (b) 0.8 0.8 0.7 0.7 Number of failures 0.6 0.5 0.4 Number of failures 0.6 0.5 0.4 0.3 0.3 0.2 0.1 Original data SVRSRG Goel Okumoto S Shaped Weibull Type TE Function 0.2 0.1 Original data SVRSRG Goel Okumoto S Shaped Weibull Type TE Function 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Execution time 0 0 0.2 0.4 0.6 0.8 1 Execution time Fig. 1. (a) The generalization curve of four SRGMs trained with Sys1 data set (b) The generalization curve of four SRGMs trained with Sys3 data set Table 1. The comparison of training error and testing error of four SRGMs for generalization Data Sys1 Sys3 Models Training error Test error Training error Test error Goel-Okumoto 0.1098 0.0576 0.1255 0.0672 S-Shaped 0.3527 0.1722 0.1792 0.0916 Weibull-type TE Function 0.0255 0.0137 0.1969 0.1040 SVRSRG 0.0065 0.0048 0.0147 0.0078

Support Vector Regression for Software Reliability Growth Modeling 929 executed for time x i and considering i data pairs (x 1,y 1 ), (x 2,y 2 ),...,(x i,y i ), we calculate predicted number of failures y i+1 at time x i+1 in the following way. First, use first i data pairs to build model. Second, predict value at time x i+1 using current model. Experiment is processed at each time points and mean training error and mean testing error are reported. From Table 2, we can find that in practical process SVRSRG can achieve more remarkable performance than other four SRGMs. Table 2. The comparison of training error and testing error of four SRGMs for prediction Data Sys1 Sys3 Models Training error Test error Training error Test error Goel-Okumoto 0.0259 0.0038 0.0864 0.0011 S-Shaped 0.0855 0.0110 0.1663 0.0015 Weibull-type TE Function 0.0127 0.0012 0.0761 0.0007 SVRSRG 0.0064 0.0021 0.0138 0.0001 16 x 10 3 (a) 0.0155 (b) 14 0.015 12 0.0145 10 0.014 Error Error 8 0.0135 6 0.013 4 0.0125 2 1 2 3 4 5 6 7 8 9 10 C 0.012 10 20 30 40 50 60 70 80 90 100 C Fig. 2. The influence of parameter C on model generalization ability. Above line is training error, bottom line is testing error. (a) with Sys1 data set(b) with Sys3 data set Parameter C in SVR is a regularized constant determining the tradeoff between the training error and the model flatness. The following experiment demonstrates the influence of parameter C on model generalization ability. It is conducted as experiment of prediction to simulate the practical process of software reliability growth prediction. The results are shown in Fig. 2. (a) and (b). We can see that with the increase of parameter C, the training error declines gradually, that is to say, the model fits the training data set better and better. However, as for testing, first the testing error declines gradually because the complexity of model is suitable for the need of testing data set more and more. And then the testing error raises because of overfitting problem. The problem of tradeoff between the training error and the model flatness can be solved by

930 Fei Xing and Ping Guo cross-validation technique which divides the training samples into two parts: one for training and another for validation to obtain satisfied generalization ability. 4 Conclusions A new technique for software reliability growth modeling and prediction is proposed in this paper. SVR is adopted to build SRGM. From the experiments we can see that the proposed SVR-based SRGM has a good performance, no matter generalization ability or predictive ability, the SVRSRG is better than conventional SRGMs. Experimental results show that our approach offers a very promising technique in software reliability growth modeling and prediction. Acknowledgements This work was fully supported by a grant from the NSFC (Project No. 60275002) and the Project-sponsored by SRF for ROCS, SEM. References 1. Musa, J.D.: Software Reliability Engineering: More Reliable Software, Faster Development and Testing. McGraw Hill (1999) 2. Lyu, M.R.: Handbook of Software Reliability Engineering. IEEE Computer Society Press and McGraw-Hill Book Company (1996) 3. Guo, P., Lyu, M.R.: A Pseudoinverse Learning Algorithm for Feedforward Neural Networks with Stacked Generalization Application to Software Reliability Growth Data. Neurocomputing, 56 (2004) 101 121 4. Cortes, C., Vapnik, V.: Support-vector Network. Machine Learning, 20 (1995) 273 297 5. Joachims, T.: Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms. Kluwer (2002) 6. Xing, F., Guo, P.: Classification of Stellar Spectral Data Using SVM. In: International Symposium on Neural Networks (ISNN 2004). Lecture Notes in Computer Science. Vo. 3173. Springer-Verlag (2004) 616 621 7. Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer-Verlag, New York (1995) 8. Karunanithi, N., Whitley, D., Malaiya, Y.: Prediction of Software Reliability Using Connectionist Models. IEEE Transactions on Software Engineering, 18 (1992) 563 574