Building Better Parametric Cost Models Based on the PMI PMBOK Guide Fourth Edition 37 IPDI has been reviewed and approved as a provider of project management training by the Project Management Institute (PMI). 1 Version 4.1 Parametric Cost Models Topics will explore ways to build more predictive parametric cost models to increase your cost estimating accuracy and overall project success. The workshop will: Discuss qualities of a good cost estimate, based on the Johnson Space Center s Parametric Cost Estimating Handbook Explore how to build better parametric cost estimating models Include an exercise in developing a parametric cost estimating model 2 Version 4.1 Integrated Process Developers, Inc. Page: 1 For permissions and other rights
Estimates What is a synonym for the word Estimate? Estimates are an approximation of reality They may be accurate approximations or poor ones A key challenge of estimation is to create accurate approximations To create accurate approximations, estimates must: Be based on good data Employ good estimation models and techniques Be based on justifiable assumptions 3 Version 4.1 Project Failure Rates The Standish Group study (2009) of thousands of IT projects found: 32 percent were clear-cut successes 44 percent were challenged 68% 24 percent were outright failures Anything that can improve our project performance should be pursued Standish Group, Chaos Summary 2009, 2009 4 Version 4.1 Integrated Process Developers, Inc. Page: 2 For permissions and other rights
Project Failure Rates Frame s Study of Project Failure Survey of 438 projects: To what extent did you meet the desired specifications on your projects? Fell short: 29% Met the specifications: 51% 71% Performed better than required: 20% 5 Version 4.1 Project Failure Rates Frame s Study of Project Failure, cont. Survey of 438 projects: Please describe the cost performance you encountered on your last project: Serious cost overrun: 17% 55% Modest cost overrun: 38% On target: 27% Modest cost under run: 12% 45% Major cost savings: 6% 6 Version 4.1 Integrated Process Developers, Inc. Page: 3 For permissions and other rights
Project Failure Rates Frame s Study of Project Failure, cont. Survey of 438 projects: To what extent did you meet your schedule target on your projects? Serious schedule slippage: 35% 69% Modest schedule slippage : 34% On target: 22% Modest ahead of schedule: 8% 31% Dramatically ahead of schedule : 1% 7 Version 4.1 Project success = Project Failure Rates On Scope (71% of the time) On Time (45% of the time) On Budget (31% of the time) Result: 10% of the projects are a success Time Cost Scope 8 Version 4.1 Integrated Process Developers, Inc. Page: 4 For permissions and other rights
Two Reasons For Project Failures In the past, project cost overruns and schedule slippages have been attributed to failures of implementation For example, the project team did a poor job executing the project Today it is evident that a large portion of project failures are actually failures of estimation For example, a company agrees to an aggressive cost and schedule obligation from a customer in order to win the business 9 Version 4.1 JSC: Parametric Cost Estimating Handbook Johnson Space Center (JSC) and the National Aeronautics and Space Administration (NASA) created the Parametric Cost Estimating Handbook Early in 1994, a joint Government/Industry Committee was formed to study ways to enhance the use of parametric cost estimating techniques The Committee found that the lack of training was one of the largest barriers to the use of parametrics. The Committee sponsored the Handbook to provide training and background information on the use and evaluation of parametric tools. 10 Version 4.1 Integrated Process Developers, Inc. Page: 5 For permissions and other rights
Parametric estimating Parametric Estimating Involves using project characteristics (parameters) in a mathematical model to predict project costs Models may be simple or complex Both the cost and accuracy of parametric models varies widely They are most likely to be reliable when The historical information used to develop the model was accurate The parameters used in the model are readily quantifiable The model is scalable (i.e., it works as well for a very large project as for a very small one) 11 Version 4.1 Parametric estimating, cont. Macro level: Parametric Estimating 2000 square foot house * $100/sf = $200,000 Detail level: Carpet: 1000 sf of carpet * $1.00/sf = $1,000 High grade carpet: 1000 sf * $1.00/sf * 1.25 = $1,250 Low grade carpet: 1000 sf * $1.00/sf * 0.8 = $800 Parametric Estimate = (Parameter Units) * (Cost per Parameter Units) * (Scale Factor) 12 Version 4.1 Integrated Process Developers, Inc. Page: 6 For permissions and other rights
Cost Estimating Relationships (CER s) For CER s to be valid, they must be developed based on sound statistical concepts Once valid CER s have been developed, then parametric cost modeling can proceed CER s are mathematical expressions relating cost as the dependent variable to one or more independent cost driving variables Dependent variable: The variable whose value is to be predicted Independent variable: The variable about which knowledge is available or can be obtained In other words, the dependent variable is dependent upon the value of independent variables 13 Version 4.1 Cost Estimating Relationships (CER s) When developing a CER, one must first theorize a logical estimating relationship between two or more variables For example: Does it make sense to expect that fuel costs will increase as aircraft engine thrusts increase? After developing a theorized relationship, one needs to assemble a database of data for the two or more variables Once the database is developed and a theory determined, one is ready to mathematically model the CER 14 Version 4.1 Integrated Process Developers, Inc. Page: 7 For permissions and other rights
Strengths and Weaknesses of CER's Strengths They are quick and easy to use Given a CER equation and the required input data, one can generally turn out an estimate quickly A CER can be used with limited system information Consequently, CER's are especially useful in the early phases of a project A CER is an excellent (statistically sound) predictor if derived from a sound database, and can be relied upon to produce quality estimates In other words: GIGO still applies! 15 Version 4.1 16 Version 4.1 Strengths and Weaknesses of CER's Weaknesses Sometimes too simplistic to forecast costs Generally, if one has detailed information, the detail may be reliably used for estimates (Bottom-up Estimating) Problems with the estimating database may mean that a particular CER should not be used The cost estimator should Validate the CER data assumptions Understand what the CER is supposed to estimate What data was used to build that CER How old the data is How they were normalized, etc. Never use a cost model without reviewing its source documentation Integrated Process Developers, Inc. Page: 8 For permissions and other rights
Two methods of curve fitting Graphical Curve Fitting Plot the data and fit a smooth curve to the data The objective in fitting the curve is to best-fit the curve to the data points plotted Each data point plotted is equally important and the curve you fit must consider each and every data point Y 17 Version 4.1 X Curve Fitting Two methods of curve fitting, cont. Least Squares Best Fit (LSBF) Use mathematical procedures to determine the one line which best fits the data set The method does this by minimizing the sum of the squared deviations of the observed values of Y and calculated values of Y For example, if the distances: (Y 1 -Y C1,), (Y 2 Y C2 ), (Y 3 - Y C3 ), (Y 4 Y C4 ), etc., parallel to the Y-axis, are measured from the observed data points to the curve, then the LSBF line is the one that minimizes the following equation (Y 1 -Y C1,) 2 + (Y 2 Y C2 ) 2 + (Y 3 -Y C3 ) 2 + + (Y n Y Cn ) 2 See graph on next slide 18 Version 4.1 Integrated Process Developers, Inc. Page: 9 For permissions and other rights
Least Squares Best Fit (LSBF) Technique The Least Squares Best Fit (LSBF) technique uses regression analysis to define the mathematical relationship between two variables The association is determined in the form of a mathematical equation The equation provides the ability to predict one variable (dependent variable) on the basis of the knowledge of the other variable (independent variable) 19 Version 4.1 Least Squares Best Fit (LSBF) Technique The relationships between variables may be Linear Curvilinear (discussed later) Linear relationships can be described graphically (on a common X-Y coordinate system) by a straight line and mathematically by the line formula y = a + bx where y = the dependent variable x = the independent variable b = the slope of the line (the change in y divided by the change in x) a = the fixed costs of work for that parameter 20 Version 4.1 Integrated Process Developers, Inc. Page: 10 For permissions and other rights
Least Squares Best Fit (LSBF) Technique Y Y 3 Y C4 Y 1 Y C2 Y C3 Y 4 Y C1 Y 2 This course is going to work the simplest model the straight line, which is expressed as: Y = a + bx X 21 Version 4.1 22 Version 4.1 Least Squares Best Fit (LSBF) Technique Multiple Regression Simple regression analysis A single independent variable (X) is used to estimate the dependent variable (Y) The relationship is assumed to be linear (a straight line) This is the most common form of regression analysis More complex regression equations consider the effects of more than one independent variable (X) on the dependent variable (Y) For example, automobile gasoline consumption may be largely explained by the number of miles driven However, it might be better explained if we also considered factors such as the weight of the automobile, tire pressure, etc. Integrated Process Developers, Inc. Page: 11 For permissions and other rights
Least Squares Best Fit (LSBF) Technique Multi-variable regression analysis Y c = A + B 1 X 1 + B 2 X 2 where: Y c = the calculated or estimated value for the dependent variable A = the Y intercept, the value of Y when X = 0 X 1 = the first independent (explanatory) variable B 1 = the slope of the line related to the change in X 1, the value by which Y 2 changes when X 1 changes by one X 2 = the second independent variable B 2 = the slope of the line related to the change in X 2, the value by which Y 2 changes when X 2 changes by one 23 Version 4.1 Least Squares Best Fit (LSBF) Technique Curvilinear Regression When the relationship between the independent variable(s) is not be linear Instead, a graph of the relationship on ordinary graph paper would depict a curve. For example, improvement curve analysis uses a special form of curvilinear regression Y Curvilinear Correlation X 24 Version 4.1 Integrated Process Developers, Inc. Page: 12 For permissions and other rights
Goodness of Fit After the LSBF regression equations are developed then we need to determine how good the equation is How good a forecast we will get by using our equation Must consider a check for the goodness of fit: Coefficient Of Determination (R 2 ) The coefficient of determination (R 2 ) represents the proportion of variation in the dependent variable that has been explained or accounted for by the regression line The value of R 2 may vary from zero (0) to one (1) R 2 = 0 indicates that none of the variation in Y is explained by the regression equation R 2 = 1 indicates that 100% of the variation of Y has been explained by the regression equation 25 Version 4.1 Goodness of Fit Y Y X Strong Positive Correlation Y X Weak Positive Correlation Y X Weak Negative Correlation No Correlation X 26 Version 4.1 Integrated Process Developers, Inc. Page: 13 For permissions and other rights
Goodness of Fit Coefficient Of Determination (R 2 ) In order to calculate R 2 we need to use the equation: R 2 2 ( xy nx y) = 2 2 ( x x x) ( y y y) R 2 tells us the proportion of total variation that is explained by the regression line 27 Version 4.1 Goodness of Fit Coefficient Of Determination (R 2 ) The Microsoft Excel approach (yes, there is an easier way!) 1. Enter the X & Y data into 2 adjacent columns 2. Select Excel s Chart Wizard 3. Select XY (Scatter) diagram top box 4. After the scatter diagram has been created, go to the Chart pull-down menu and select Add Trendline Select Linear 5. While in the Add Trendline menu, select Options and choose Display R-squared value on chart and Display equation on chart 28 Version 4.1 Integrated Process Developers, Inc. Page: 14 For permissions and other rights
Goodness of Fit Hours of Study 27 36 52 49 31 24 40 41 53 37 PMP Score 84 93 153 174 117 79 142 126 163 107 PMP Score Correlation between Hours of Study & PMP Score 180 170 y = 3.0382x + 5.3098 160 R 2 = 0.836 150 140 130 120 110 100 90 80 20 25 30 35 40 45 50 55 Hours of Study 29 Version 4.1 30 Version 4.1 Goodness of Fit Coefficient Of Determination (R 2 ) R 2 is a relative measure of the goodness of fit of the observed data points to the regression line For example, if R 2 = 0.70, this means that 70% of the total variation in the observed values of Y is explained by the observed values of X Similarly, if R 2 = 0.50, then 50% of the variation in Y is explained by X If R 2 = 1.00 then the regression line perfectly fits all the observed data points As the level of fit becomes less accurate, less and less of the variation in Y is explained by Y s relation with X, which means that R 2 must decrease The lowest value of R 2 is 0, which means that none of the variation in Y is explained by the observed values of X Integrated Process Developers, Inc. Page: 15 For permissions and other rights
Pitfalls of Parametric Estimates Pitfalls to Avoid in Parametric Estimates Using the parametric model outside its database range In forecasting beyond the database range, we do not know the shape of the curve there is more estimating risk involved Using a parametric model not researched or validated Regression and correlation analysis can in no way determine cause and effect It is up to the analyst to do a logic check, determine an appropriate hypothesis and analyze the data base such that an assessment can be made regarding cause and effect Conditions can change Make sure the conditions of the LSBF equation apply to the current project factors being estimates 31 Version 4.1 Pitfalls of Parametric Estimates Pitfalls to Avoid in Parametric Estimates, cont. Using a parametric model without access to realistic estimates of the independent variables' values for product / effort being estimated Single point estimates for the independent variable value (X) versus a most likely range, if possible and practical 32 Version 4.1 Integrated Process Developers, Inc. Page: 16 For permissions and other rights
Parametric Estimate Exercise In small groups, determine the cost for the new house Define the dependent variable (Y) (Hint: Total Build Costs) Determine the independent variable (X) Explore the relationship between the independent and dependent variables Determine the relationship that best predicts the dependent variable Estimate the cost of House #6 House Number House #1 House #2 Total Build Costs $266,500 $265,000 Number of Baths Sq. Feet of House 2,800 sf 2,700 sf House #3 $268,000 3.0 2,860 sf House #4 $260,000 2.0 2,440 sf House #5 $257,000 2.0 1,600 sf House #6? 2.5 2,600 sf 33 Version 4.1 2.5 2.0 Exterior Wall Surface 2,170 sf 2,250 sf 2,190 sf 1,990 sf 1,400 sf 2,100 sf Parametric Estimate Exercise Answer Cost $270,000 $268,000 $266,000 $264,000 $262,000 $260,000 $258,000 $256,000 $254,000 Number of Baths vs. Cost y = 7500x + 246250 R 2 = 0.6081 1.5 2 2.5 3 3.5 Number of Baths 34 Version 4.1 Integrated Process Developers, Inc. Page: 17 For permissions and other rights
Parametric Estimate Exercise Answer $270,000 $268,000 $266,000 Sq. Feet of House vs. Cost y = 8.2087x + 243050 R 2 = 0.751 Cost $264,000 $262,000 $260,000 $258,000 1,500 1,700 1,900 2,100 2,300 2,500 Sq. Feet of House 2,700 2,900 3,100 35 Version 4.1 Parametric Estimate Exercise Answer Cost $269,000 $268,000 $267,000 $266,000 $265,000 $264,000 $263,000 $262,000 $261,000 $260,000 $259,000 Sq. Feet Exterior Wall vs. Cost y = 6.9871x + 250303 R 2 = 0.4016 1,300 1,400 1,500 1,600 1,700 1,800 1,900 2,000 Sq. Feet of Exterior Wall 2,100 2,200 2,300 36 Version 4.1 Integrated Process Developers, Inc. Page: 18 For permissions and other rights
Parametric Estimate Exercise Answer The Square Feet of House is the best parametric estimating independent variable (R 2 = 0.751) Line equation: Y = 8.2087x + 243050 Therefore, the price of the new house is: $264,400 However, if we eliminate House #5 then the R 2 is 0.9974 and the line equation is: Y = 18.702X + $214,381 Therefore, the price of the new house is: $263,000 37 Version 4.1 Cost Sq. Feet of House vs. Cost w/o House #5 $269,000 $268,000 y = 18.702x + 214381 $267,000 R 2 = 0.9974 $266,000 $265,000 $264,000 $263,000 $262,000 $261,000 $260,000 $259,000 2,400 2,500 2,600 2,700 Sq. Feet of House 2,800 2,900 Building Better Parametric Cost Models: Additional Information Based on the PMI PMBOK Guide Fourth Edition 88 IPDI has been reviewed and approved as a provider of project management training by the Project Management Institute (PMI). 38 Version 4.1 Integrated Process Developers, Inc. Page: 19 For permissions and other rights
39 Version 4.1 JSC: Parametric Cost Estimating Handbook List of steps that should be taken in order to create good estimates The purpose of the estimate is stated in writing The product to be produced is clearly described The tasks to be estimated are clearly identified Diversified group of people are involved in the estimate The validity of the estimate is demonstrated by past projects More than one cost model or estimating approach is used Potential cost & schedule impacts are estimated for all tasks Dictated schedules are analyzed for impacts on cost Estimates are updated whenever Changes to requirements affect cost or schedule Constraints change Actual values vary from plan Tracking indicates that critical path tasks cannot be completed as planned JSC: Good Estimating Environment A good estimating environment requires Management acknowledges its responsibility for developing and sustaining an estimating capability The estimating function is funded Estimators have been equipped with the tools and training needed for reliable estimating The assigned estimators are experienced and capable The estimating capability of the organization is quantified, tracked, and evaluated Recognition and career paths exist such that qualified people want to become estimators 40 Version 4.1 Integrated Process Developers, Inc. Page: 20 For permissions and other rights
Parametric Estimating Historical cost and labor hours data are required as a basis for cost estimating Parametric estimating is no exception Estimating data should be collected and maintained to allow for an audit trail Actual cost information and expenditure dates should be recorded so that cost estimates can be adjusted as required 41 Version 4.1 Significant Adjustments to Parametric Data Consistent Scope Adjustments are appropriate for differences in product scope between the historical data and the estimate being made Anomalies Historical cost data should be adjusted for anomalies in the current project being estimated Improved Technology Cost changes, due to changes in technology, are a matter of judgment and analysis All bases for such adjustments should be documented and disclosed 42 Version 4.1 Integrated Process Developers, Inc. Page: 21 For permissions and other rights
Significant Adjustments to Parametric Data Inflation There are no fixed ways to establish universal inflation indices (past, present or future) that fit all possible situations Learning Curve The learning curve, as originally conceived, analyses labor hours over successive production units of a manufactured item Production Rate Changes in production rate (i.e., units/months) can be calculated in various ways 43 Version 4.1 Calibration and Validation of Cost Models Cost models, either internally developed or commercially purchased, need to be calibrated and validated for acceptance The validation of a cost model includes the following steps: 1. Calibrate the model to historical cost data Account for inflation, technological advances, etc. 2. Estimate the cost of past completed projects Does the model results match the actual costs of previous projects? 3. Compare the estimates with actual costs to demonstrate acceptable accuracy Is the model accurate enough to estimate new projects? 44 Version 4.1 Integrated Process Developers, Inc. Page: 22 For permissions and other rights
Uncertainty and Risk Reduction A cost estimate has a probability of being within a given percentage of the correct answer Since a model of the real world involves simplifications, the final actual cost will rarely equal the estimated cost These modeling uncertainties translate into cost estimate risk Such uncertainties (risks) can be grouped into two major categories Uncertainty of any organization to perform as planned due to unforeseen events Uncertainty associated with the development and usefulness of any cost model 45 Version 4.1 Questions? 46 Version 4.1 Integrated Process Developers, Inc. Page: 23 For permissions and other rights
Parametric Estimate Exercise Answer Cost $270,000 $268,000 $266,000 $264,000 $262,000 $260,000 $258,000 $256,000 $254,000 Number of Baths vs. Cost y = 7500x + 246250 R 2 = 0.6081 1.5 2 2.5 3 3.5 Number of Baths 47 Version 4.1 Parametric Estimate Exercise Answer $270,000 $268,000 $266,000 Sq. Feet of House vs. Cost y = 8.2087x + 243050 R 2 = 0.751 Cost $264,000 $262,000 $260,000 $258,000 1,500 1,700 1,900 2,100 2,300 2,500 Sq. Feet of House 2,700 2,900 3,100 48 Version 4.1 Integrated Process Developers, Inc. Page: 24 For permissions and other rights
Parametric Estimate Exercise Answer Cost $269,000 $268,000 $267,000 $266,000 $265,000 $264,000 $263,000 $262,000 $261,000 $260,000 $259,000 Sq. Feet Exterior Wall vs. Cost y = 6.9871x + 250303 R 2 = 0.4016 1,300 1,400 1,500 1,600 1,700 1,800 1,900 2,000 Sq. Feet of Exterior Wall 2,100 2,200 2,300 49 Version 4.1 Integrated Process Developers, Inc. Page: 25 For permissions and other rights