Ian Watson 1, Emilia Mendes 1, Chris Triggs 2, Nile Mosley 3 & Steve Counsell 3

From: FLAIRS-02 Proceedigs. Copyright 2002, AAAI (www.aaai.org). All rights reserved. Usig CBR to Estimate Developmet Effort for Web Hypermedia Applicatios Ia Watso, Emilia Medes, Chris Triggs 2, Nile Mosley 3 & Steve Cousell 3 Computer Sciece Dept. Uiversity of Aucklad, New Zealad 2 Statistics Dept. Uiversity of Aucklad, New Zealad 3 Computer Sciece Dept. Aucklad Uiversity of Techology, New Zealad Computer Sciece Dept. Birkbeck College, Uiversity of Lodo, UK {ia emilia}@cs.aucklad.ac.z Abstract Good estimates of developmet effort play a importat role i the successful maagemet of larger software developmet projects. This paper compares the predictio accuracy of three CBR techiques to estimate the effort to develop Web hypermedia applicatios. Most comparative studies have used oe CBR techique. We believe this may bias the results, as there are several CBR techiques that may also be used for effort predictio. This paper shows that a weighted Euclidia similarity measure was the most accurate of the CBR techiques tested. Itroductio Software practitioers recogise the importace of realistic estimates of effort to the successful maagemet of software projects, the Web beig o exceptio. Havig realistic estimates at a early stage i a project's life cycle allow project maagers ad developmet orgaisatios to maage resources effectively. Several techiques for cost ad effort estimatio have bee proposed over the last 30 years, fallig ito three geeral categories, expert judgemet, algorithmic models ad machie learig []. Recetly several comparisos have bee made betwee the three categories of predictio techiques [2, 3 & ]. However o covergece has bee obtaied to date. Most comparisos i the literature measure the predictio accuracy of techiques usig attributes (e.g. lies of code, fuctio poits) of covetioal software. This paper looks at predictio accuracy based o attributes of Web hypermedia applicatios istead. Our research focus is o proposig ad comparig developmet effort predictio models for Web hypermedia applicatios []. Readers iterested i effort estimatio models for Web software applicatios are referred to [5 & 6]]. The metrics used i our study reflect curret idustrial practices for developig multimedia ad Web hypermedia applicatios [7 & 8]. This paper compares the predictio accuracy of three CBR techiques to estimate the effort to Copyright 2002, America Associatio for Artificial Itelligece (www.aaai.org). All rights reserved. develop Web hypermedia applicatios. As desig decisios, whe buildig CBR predictio systems, are ifluetial upo the results [9], we wated to reduce ay bias that may hider these results, before comparig them to other predictio models, the results of which are preseted elsewhere. This objectives are reflected i the followig research questio: will differet combiatios of parameter categories for the CBR techique geerate statistically sigificatly differet predictio accuracy? These issues are ivestigated usig a dataset cotaiig 37 Web hypermedia projects developed by postgraduate ad MSc studets studyig a Hypermedia ad Multimedia Systems course at the Uiversity of Aucklad. Several cofoudig factors, such as Web authorig experiece, tools used, structure of the applicatio developed, were cotrolled, so icreasig the validity of the obtaied data. The remaider of the paper is orgaised as follows: Sectio 2 describes our research method. Sectio 3 presets the results for the compariso of CBR approaches ad Sectio presets our coclusios. 2 Research Method 2. Dataset All aalysis preseted i this paper was based o a dataset cotaiig iformatio for 37 Web hypermedia applicatios developed by postgraduate studets. The data set is described i detail i the compaio paper Each Web hypermedia applicatio provided 6 pieces of data [], from which we idetified 8 attributes, show i Table, to characterise a Web hypermedia applicatio ad its developmet process. These attributes form a basis for our data aalysis. Total effort is our depedet/respose variable ad the other 7 attributes are our idepedet/predictor variables. All attributes were measured o a ratio scale. The criteria used to select the attributes was [7]: i) practical relevace for Web hypermedia developers; ii) metrics which are easy to lear ad cheap to collect; iii) coutig rules which were simple ad cosistet. The origial dataset of 37 observatios had three outliers where total effort was urealistic. Those outliers were

removed from the dataset, leavig observatios. Total effort was calculated as: j= m k = o Total-effort = PAE + MAE + j= 0 k= 0 PRE where PAE is the page authorig effort, MAE the media authorig effort ad PRE the program authorig effort []. A detailed descriptio of threats ad commets o the validity of the case study is preseted i []. Table - Size ad Complexity Metrics Metric Descriptio Page Cout (PaC) Number of html or shtml files used i the applicatio. Media Cout (MeC) Number of media files used i the applicatio. Program Cout Number of JavaScript files (PRC) ad Java applets used i the applicatio. Reused Media Cout (RMC) Number of reused/modified media files. Reused Program Number of reused/modified Cout (RPC) programs. Coectivity Desity (COD) Total umber of iteral liks divided by Page Cout. Total Page Average umber of differet Complexity (TPC) types of media per page. Total Effort (TE) Effort i perso hours to desig ad author the applicatio 2.2 Evaluatio Criteria The most commo approaches to assessig the predictive power of effort predictio models are: The Magitude of Relative Error (MRE) [0] The Mea Magitude of Relative Error (MMRE) [] The Media Magitude of Relative Error (MdMRE) [2] The Predictio at level (Pred()) [3] Boxplots of residuals [] The MRE is defied as: MRE i = ActualEffort i - Pr edictedeffort i (2) ActualEffort i Where i represets each observatio for which effort is predicted. The mea of all MREs is the MMRE, which is calculated as: ActualEfforti - Pr edictedefforti MMRE= (3) ActualEffort i The mea takes ito accout the umerical value of every observatio i the data distributio, ad is sesitive () to idividual predictios with large MREs. A optio to the mea is the media, which also represets a measure of cetral tedecy, however it is less sesitive to extreme values. The media of MRE values for the umber i of observatios is called the MdMRE. Aother idicator which is commoly used is the Predictio at level l, also kow as Pred(l). It measures the percetage of estimates that are withi l% of the actual values. Suggestios have bee made [5] that l should be set at 25% ad that a good predictio system should offer this accuracy level 75% of the time. I additio, other predictio accuracy idicators have bee suggested as alteratives to the commoly used MMRE ad Pred() []. Oe such idicator is to use boxplots of the residuals (actual-estimate) [6]. The statistical sigificace of all the results, except boxplots, was tested usig the T-test for paired MREs ad MMREs ad the Wilcoxo Rak Sum Test or Ma- Whitey U Test for MdMREs. Both were geerated usig % ad 5% levels of sigificace. 3 Comparig CBR Approaches Durig the process of applyig case-based reasoig users may eed to choose five parameters, as follows:. Feature subset selectio 2. Similarity measure 3. Scalig. Number of retrieved cases 5. Case adaptatio Each parameter i tur ca be split ito more detail, ad icorporated or ot for a give CBR tool. Based o that, the questio asked here is: will differet combiatios of parameter categories for the CBR techique geerate statistically sigificatly differet predictio accuracy? I aswer, we compared the predictio accuracy of several estimatios geerated usig differet categories for a give parameter. Estimatios were geerated usig two CBR tools, amely ANGEL [7] ad CBR-Works [8]. ANGEL was developed at Bouremouth Uiversity. A importat feature is its ability to determie the optimum combiatio of attributes for retrievig aalogies (cases). ANGEL compares similar projects by usig the uweighted Euclidea distace usig variables that have bee stadardised betwee 0 ad [7]. CBR-Works is a state-of-the-art commercial CBR eviromet [8]. It was a product of years of collaborative Europea research by the INRECA I & II projects [9]. It is available commercially from Empolis (www.tecio.com). The tool provides a variety of retrieval algorithms (Euclidea, weighted Euclidea, Maximum Similarity,) as well as fie cotrol over idividual feature similarity metrics. I additio, it provides sophisticated support for symbolic features ad taxoomies hierarchies as well as providig adaptatio rules ad formulae. 3. Feature subset selectio Feature subset selectio ivolves determiig the optimum subset of features that gives the most accurate estimatio. ANGEL optioally offers this fuctioality by applyig a

brute force algorithm, searchig for all possible feature subsets. CBR-Works does ot provide similar fuctioality. Table 2 - Comparig FSS to NFSS Used FSS Did ot use FSS k= k=2 k=3 k= k=2 k=3... MMR 0.09 0. 0.2 0.5 0.5 0.5 E MdMRE 0.08 0.09 0.0 0.2 0. 0.3 Pred(25) 97 9 88 76 82 82 To ivestigate if the feature subset selectio would help achieve better predictio accuracy, we used the ANGEL tool, ad leave-oe-out cross-validatio. The results are summarised o Table 2 ad a boxplot of the residuals is preseted o Figure. O Table 2 K represets the umber of retrieved cases (K,K2,K3), FSS stads for "Feature Subset Selectio" ad NFSS for "No Feature Subset Selectio". It was observed that the predictio accuracy for estimatios based o FSS were more accurate tha those based o all seve attributes. The boxplots of the residuals show that the best predictios were obtaied usig retrieved case (K) + FSS optio, followed by two cases (K2) + FSS, ad 3 cases (K3) + FSS. These results were also cofirmed by the values for MMRE, MdMRE ad Pred(25). 80 60 0 20-20 -0-60 0 N = KFSS KNFSS K2FSS Figure - Boxplots of the for FSS ad NFSS For k= case, the MRE for FSS was sigificatly less tha that for NFSS (a=0.0), usig a T-test. For k=2 ad 3 cases the differece betwee FSS ad NFSS was ot statistically sigificat. Comparig these results to the boxplots of residuals suggests that for k= the feature subset selectio may ideed affect the accuracy of the predictio obtaied 3.2 Similarity Measure To our kowledge, the similarity measure most frequetly used i Software egieerig ad Web egieerig literature, is the uweighted Euclidea distace. I the cotext of this ivestigatio we have used three measures K2NFSS K3FSS K3NFSS of similarity, amely the uweighted Euclidea distace, the weighted Euclidea distace ad the Maximum measure. 3.3 Scalig or Stadardisatio Stadardisatio represets the trasformatio of attribute values such that all attributes are measured usig the same uit. Oe possible solutio is to assig zero to the miimum observed value ad oe to the maximum observed value [9]. This is the strategy used by ANGEL ad was the strategy chose for part of the aalysis carried out usig CBR- Works. 3. Number of Retrieved Cases The umber of retrieved cases refers to the umber of retrieved most similar cases that will be used to geerate the estimatio. For Agelis ad Stamelos [20] whe small sets of data are used it is reasoable to cosider oly a small umber of cases. I this study we have used, 2 ad 3 retrieved cases, similarly to [3, 7 & 20]. Dist. K Adpt. SV? MMRE MdMRE Pred(25) UE Mea Yes 0.2 0.0 88.2 No 0. 0.09 9.8 2 Mea Yes 0.5 0.2 82.35 No 0.3 0. 88.2 IRWM Yes 0.3 0. 85.29 No 0.2 0. 9.8 3 Mea Yes 0. 0. 82.35 No 0.2 0.0 9.8 IRWM Yes 0.3 0.2 85.29 No 0. 0.08 9.8 Media Yes 0. 0.0 76.7 No 0. 0.09 82.35 WE Mea Yes 0.0 0.09 9.2 No 0. 0.09 9.2 2 Mea Yes 0.3 0. 9.2 No 0.3 0. 9.2 IRWM Yes 0.2 0. 97.06 No 0. 0. 97.06 3 Mea Yes 0.3 0.09 88.2 No 0.2 0.09 88.2 IRWM Yes 0.2 0.2 9.2 No 0.2 0.2 9.2 Media Yes 0. 0.0 82.35 No 0.3 0.0 82.35 MX Mea Yes 0.32 0. 26.7 No 0.32 0.33 26.7 2 Mea Yes 0.23 0.7 67.65 No 0.23 0.7 67.65 IRWM Yes 0.25 0.23 58.82 No 0.25 0.23 58.82 3 Mea Yes 0.25 0.5 76.7 No 0.2 0.5 76.7 IRWM Yes 0.23 0.6 67.65 No 0.23 0.6 67.65 Media Yes 0.3 0.7 58.82 No 0.3 0.6 6.76 Dist. = distace K = # of retrieved cases UE = Uweighted Euclidea Adpt. = adaptatio WE = Weighted Euclidea SV? = Stadardised Variable? MX = Maximum Table 3 - Compariso of CBR Techiques. 3.5 Case Adaptatio Oce the most similar case(s) has/have bee retrieved the ext step is to decide how to geerate the estimatio. Choices of case adaptatio techiques preseted i the software egieerig literature vary from the earest eighbour [3], the mea of the closest cases [3], the media [20], iverse distace weighted mea ad iverse rak weighted mea [9], to illustrate just a few. We opted for the mea (the average of k retrieved cases, whe k>),

media (the media of k retrieved cases, whe k>2) ad the iverse rak weighted mea, which allows more similar cases to have more ifluece tha less similar oes( e.g., if we use 3 cases, for example, the closest case would have weight = 3, the secod closest weight = 2 ad the last oe weight =). 3.6 Compariso of techiques The first questio we wated to aswer was if there were ay statistically sigificat differeces betwee results obtaied usig Stadardised ad No-stadardised variables. A T-test (for MMREs) ad a Ma-Whitey U Test (for MdMREs), for a=0.0 ad a=0.05 did ot reveal ay statistically sigificat differeces. The secod questio was if there were ay statistically sigificat differeces betwee results obtaied usig differet distaces (Uweighted Euclidea, Weighted Euclidea ad Maximum). This time we restricted our aalysis to results obtaied usig stadardised variables. Both T-test (for MMREs ad Pred(25)) ad a Wilcoxo Siged Rak Test (for MdMREs), usig a=0.0 ad a=0.05 were performed (see Table ). Table - Compariso of Distaces Dista ce T-test Wicoxo test UE x WE 3.796 -.633 * UE x MX - -2.207* 6.982 ** WE x MX - 7.652 ** -2.207* UE = Uweighted Euclidea WE = Weighted Euclidea MX = Maximum ** statistically sigificat at % * statistically sigificat at 5% Table 5 - Compariso of Euclidea Distaces aalog y 2 aalogie s 3 aalogie s UE x WE.338-2.00* 0.60 WE = Weighted Euclidea UE = Uweighted Euclidea * statistically sigificat at 5% It was o surprise to obtai statistically sigificat results whe comparig the Maximum distace to ay other type, as it gave much worse results tha the other two. The Weighted Euclidea (WE) showed statistically sigificat better results (a=0.0) tha the Uweighted Euclidea (UE), for MMREs (Table ) ad paired MREs (Table 5), however oe whe we used MdMREs. Boxplots of the residuals (Figure 2) corroborate the results obtaied usig the T-test. The aswer to our questio was therefore, positive: there are statistically sigificat differeces betwee results obtaied usig differet distaces. N = UEK WEK UEK2 WEK2 UEK3 WEK3 Figure 2 - Boxplots of the for Euclidea distaces Cosequetly, the aswer to our geeral questio: will differet combiatios of parameter categories for the CBR techique geerate statistically sigificatly differet predictio accuracy? was, at least for the dataset used, positive. Differet combiatios of parameter categories for the CBR techique gave statistically sigificatly differet predictio accuracy..0.8.6..2 0.0 -.2 -..6..2 0.0 -.2 -. N = WEK 2 Figure 3 - Boxplots of the for Weighted Distaces Our ext step was to choose the WE combiatio that gave the best predictio accuracy, ad to assess whether differet predictio accuracies would be statis tically sigificat or ot. To decide, we compared paired MREs for oe, two ad three retrieved cases usig a T-test (Table 6). Boxplots for their residuals (Figure 3) cofirmed the results obtaied by the T-test, ie., oe retrieved case (the most similar) gave the best results, which were statistically sigificatly better tha those for two ad three retrieved cases. Cosequetly, the techique, which gave the best WEK2 2 WEK3 2

predictio accuracy, used oe retrieved case, based o a weighted Euclidia distace. Table 6 - Compariso Weighted Euclidea Distaces k= vs k=2 k= vs. k=3 k=2 vs. k=3-3.290** -3.290** 0.29 ** statistically sigificat at % Coclusios I this study we ivestigated two questios related to effort predictio models for Web hypermedia applicatios, which were:. Will differet combiatios of parameter categories for the CBR techique geerate statistically sigificatly differet predictio accuracy? 2. Which of the techiques employed i this study gives the most accurate predictios for the dataset? I addressig the first questio, our results show that the CBR techique which gave the most accurate results used a Weighted Euclidea distace similarity measure to retrieve a sigle most similar case (k=). We do accept that our results may obviously be depedet o the data set that we used ad future work will seek to exted the data sets that we use. 5 Refereces [] M.J. Shepperd, C. Schofield, ad B. Kitcheham, "Effort Estimatio Usig Aalogy." Proc. ICSE-8, IEEE Computer Society Press, Berli, 996. [2] A.R. Gray, ad S.G. MacDoell. A compariso of model buildig techiques to develop predictive equatios for software metrics. Iformatio ad Software Techology, 39: 25-37, 997. [3] L.C. Briad, K.El-Emam, D. Surma, I. Wieczorek, ad K.D. Maxwell, A Assessmet ad Compariso of Commo Cost Estimatio Modelig Techiques, Proceedigs of ICSE 999, Los Ageles, USA, p:33-322, 999. [] Medes, E., Mosley, N., ad Cousell, S. Web Metrics Estimatig Desig ad Authorig Effort. IEEE Multimedia, Special Issue o Web Egieerig, Jauary-March, 50-57, 200. [5] M. Morisio, I. Stamelos, V. Spahos ad D. Romao, Measurig Fuctioality ad Productivity i Web-based applicatios: a Case Study, Proceedigs of the Sixth Iteratioal Software Metrics Symposium, 999, pp. -8. [6] D.J. Reifer, Web Developmet: Estimatig Quickto-Market Software, IEEE Software, November/December 2000, p:57-6. [7] Cowderoy, Measures of size ad complexity for web-site cotet, Proceedigs of the Combied th Europea Software Cotrol ad Metrics Coferece ad the 3rd SCOPE coferece o Software Product Quality, Muich, Germay, p:23-3. [8] A.J.C. Cowderoy, A.J.M. Doaldso, J.O. Jekis, A Metrics framework for multimedia creatio, Proceedigs of the 5th IEEE Iteratioal Software Metrics Symposium, Marylad, USA, 998. [9] G. Kadoda, M. Cartwright, L. Che, ad M.J. Shepperd, Experieces Usig Case-Based Reasoig to Predict Software Project Effort, Proceedigs of the EASE 2000 Coferece, Keele, UK, 2000. [0] C.F. Kemerer, A Empirical Validatio of Software Cost Estimatio Models, Commuicatios of the ACM, v.30:5. P:6-29. [] M.J. Shepperd, C. Schofield, ad B. Kitcheham, "Effort Estimatio Usig Aalogy." Proc. ICSE-8, IEEE Computer Society Press, Berli, 996 [2] Myrtveit, ad E. Stesrud, "A Cotrolled Experimet to Assess the Beefits of Estimatig with Aalogy ad Regressio Models," IEEE Trasactios o Software Egieerig, Vol. 25, No., Jul./Aug. 999, pp. 50-525. [3] M.J. Shepperd, ad C. Schofield, Estimatig Software Project Effort Usig Aalogies. IEEE Trasactios o Software Egieerig, Vol. 23, No., pp. 736-73, 997. [] B.A. Kitcheham, L.M. Pickard, S.G. MacDoell, M.J. Shepperd, " What accuracy statistics really measure", IEE Proceedigs - Software Egieerig Jue 200, Vol. 8 Issue 3, p: 07. [5] S. Cote, H. Dusmore, ad V. She, Software Egieerig Metrics ad Models. Bejami/Cummigs, Melo Park, Califoria, 986. [6] L.M. Pickard, B.A. Kitcheham, ad S.J Likma, A ivestigatio of aalysis techiques for software datasets, Proceedigs of the 6th Iteratioal Symposium o Software Metrics (Metrics99), IEEE Computer Society Press, Los Alamitos, Califoria, 999. [7] Schofield, C. A empirical ivestigatio ito software estimatio by aalogy, PhD thesis, Dept. of Computig, Bouremouth Uiv., UK, (998). [8] Schulz S. CBR-Works - A State-of-the-Art Shell for Case-Based Applicatio Buildig, Proceedigs of the Germa Workshop o Case-Based Reasoig, GWCBR'99 (999). Lecture Notes i Artificial Itelligece Spriger-Verlag 995. [9] Bergma, R. Highlights of the INRECA Projects. I Case-Based Reasoig Research & Developmet. Aha, D. & Watso, I. (Eds.) pp. - 5. Spriger Lecture Notes i AI 2080. Berli. 200. [20] L. Agelis, ad I. Stamelos, A Simulatio Tool for Efficiet Aalogy Based Cost Estimatio, Empirical Software Egieerig, 5, 35-68, 2000.