Chapter 8 Detecting and Circumventing Collinearity or Ill-Conditioning Problems
Section 8.1 Introduction
Multicollinearity/Collinearity/Ill-Conditioning The terms multicollinearity, collinearity, and ill-conditioning all refer to the same phenomenon, namely near linear dependencies among the set of predetermined variables. Unlike serial correlation and heteroscedasticity, this issue does not involve the disturbance term of the econometric model. This issue does not involve the dependent or endogenous variable in the model. This problem deals exclusively with the set of explanatory variables. Because of near linear dependencies among righthand side variables, it is difficult to disentagle their separate effects. Quite simply then, the right-hand side variables display redundant information. This issue is not a statistical problem it is a data problem 3 This issue is the Rodney Dangerfield of applied econometrics
Collinearity Nature of the Problem Consequences Belsley, Kuh, Welsch Diagnostics Variance inflation factors Condition indices Variance-decomposition proportions Circumvention of the Problem: Ridge Regression Dropping or Combining Explanatory Variables 4
The Ballentine Yellow Y Blue Red Green X Brown Orange Black Z 5 Multiple regression of Y on X and Z. OLS estimators: Use of blue area to estimate and green area to estimate β X Discard information in red area βz We may graphically depict the issue of collinearity. (See Peter Kennedy, 2009)
Formal Definition of Collinearity k j= 0 a X 0 j j X j is the j th explanatory variable j = 0, 1, 2,, k. Near linear dependency among regressor variables Departure from orthogonality of the columns of the data matrix X (X T X) Singularity (X T X) 1 Elements explode 6
Case1 Orthogonal Case X T X (X X) 1 1 0 1 0 0 1 0 1 Non-orthogonal Case T X T 1 X Case 2 1.9.9 1 5.26 4.74 4.74 5.26.19 Case 3 1.99.99 1 50 49.5 49.5 50.02 7 Key Points: (1) Sampling variances of estimated OLS coefficients increase sharply (2) Greater sampling covariances for the OLS coefficients
Ill-Conditioning-Multicollinearity- Collinearity Deals with specific characteristics of the data matrix X-data problem; a data problem, not a statistical problem In econometric applications, many times we may end up using regressors with high correlation. Speak in terms of severity rather than of its existence or nonexistence Effects on structural integrity of econometric models Opposite of collinear orthogonal Recall the dummy variable trap from chapter 5. The dummy variable trap is an example of perfect collinearity. 8
Constitutes a threat to proper specification and effective estimation of a structural relationship Larger variances (standard errors) of regression coefficients; VAR( β) = σ 2 (X T X) 1 not indistinguishable from the consequences of inadequate variability in regressors Covariances among parameter estimates often large and of the wrong sign Difficulties in interpretation Confidence regions for parameters wide Increase in type II error (Accept H 0 when H 0 false) Decrease in the power of statistical tests (e.g., t, F) 9
Section 8.2 Collinearity Diagnostics
Collinearity Diagnostics Multicollinearity refers to the presence of highly intercorrelated exogenous variables in regression models. It is not surprising that it is considered one of the most ubiquitous, significant, and difficult problems in applied econometrics often referred to by modelers as the familiar curse. Collinearity diagnostics measure how much regressors are related to other regressors and how these relationships affect the stability and variance of the regression estimates. 11
Signs of Collinearity Signs of collinearity in a regression analysis include: (1) Large standard errors on regression coefficients, so that estimates of the true model parameters become unstable and low t-values prevail. (2) The parameter estimates vary considerably from sample to sample. (3) Often there are drastic changes in the regression estimates after only minor data revision. (4) Conflicting conclusions are reached from the usual tests of significance (such as the wrong sign for a parameter). (5) Extreme correlations between pairs of variables. (6) Omitting a variable from the equation results in smaller regression standard errors. (7) A good fit not providing good forecasts. 12
We use collinearity diagnostics to: (1) produce a set of condition indices that signal the presence of one or more near dependencies among the variables. (Linear dependency, an extreme form of collinearity, occurs when there is an exact linear relationship among the variables). (2) uncover those variables that are involved in particular near dependencies and to assess the degree to which the estimated regression coefficients are degraded by the presence of the near dependencies. In practice, if one exogenous variable is highly correlated with the other explanatory variables, it is extremely unlikely that the exogenous variable in question contributes substantially to the prediction equation. 13
Variance Inflation Factors (VIFs) The variance inflation factor (VIF i ) for variable i is defined as follows: VIF i = 1 (1 R 2 i ) = 1 Tolerance i Tolerance = (1 R i As the squared multiple correlation of the exogenous variable with the other exogenous variables approaches unity, the corresponding VIF becomes infinite. If exogenous variables are orthogonal to each other (no correlation), the variance inflation factor is 1.0. VIFs > 10 signify potential collinearity problems. This indicator is a rule of thumb, devoid of any statistical clothing. 2 ) 14 continued...
VIF i thus provides us with a measure of how many times larger the variance of the i th regression coefficient is for collinear data than for orthogonal data (where each VIF is 1.0). If the VIFs are not too much larger than 1.0, collinearity is not a problem. An advantage of knowing the VIF for each variable is that it gives the analyst a tangible idea of how much the variances of the estimated coefficients are degraded by collinearity. 15
Condition Indices X T X = λ 1λ2... λp eigenvalues of X T X Small determinant -- some (or many) of the eigenvalues are small Belsley, Kuh, Welsch diagnostic tools Condition number of the X matrix k (X) = λmax / λmin = µ MAX / µ MIN Condition index η s = µ MAX / µ s s = µ s is the sth singular value 1,..., p 16 µ = s λ s The eigenvalues of X T X are the squares of the singular values of X.
η s = µ MAX / µ s s = s th condition index of the nxp data matrix X 1,..., p The condition indices are the square roots of the ratio of the largest eigenvalue to each individual eigenvalue. The largest condition index is the condition number of the X matrix. Key Points There are as many near dependencies among the columns of a data matrix X as there are high condition indices. Weak dependencies are associated with condition indices around 5 or 10. Moderate to strong relations are associated with condition indices > 30. Again, these cutoffs are rules of thumb provided by Belsley, Kuh, and Welsch based on Monte Carlo experiments. 17
Variance Decomposition Proportions For each right-hand side variable, we can express the variance of the estimated coefficient as a function of eigenvalues (singular values). Interest lies in the proportion of the variance associated with individual eigenvalues (singular values). A degrading collinearity problem occurs when two or more variables have variance proportions greater than 0.5 in association with a condition index 30. It is quite possible to have more than one degrading collinearity problem among the explanatory variables. 18
Eigenvalue (p of λ λ MAX λ them) : 2 MIN Singular (p of µ µ MAX µ them) : 2 MIN Value Condition Index( η ) µ (p of µ MAX MAX them) 1 : µ µ 2 MIN s VAR( β ) Π Π Π 10 20 : p0 0 VAR( β ) Π Π Π 11 : 21 p1 1 L L L L VAR( β Π Π Π 1p 2 p : pp p ) Variance proportions must sum to 1 p i= 1 Diagnostic Procedure ij = 1, j = 1, 2,..., p. (1) µ s 30 and simultaneously (2) two or more Π sj.5 19 Source: Belsley, Kuh, Welsch. Regression Diagnostics Identifying Influential Data and Sources of Collinearity, (1980), John Wiley & Sons.
Examples of Collinearity Diagnostics Note that variables 1, 2, 4, 5, and 6 have VIFs that are greater than 10. Consequently, these variables are suspects in the case of collinearity 20
Examples of Collinearity Diagnostics 21 Examination of the condition index column reveals three potentially degrading dependency situations. The condition number of this matrix is 43,275.043. Examination of the variance proportions simultaneously, with condition indices 30 reveal that (1) the intercept and variables 1, 4, and 5 are involved in one degrading collinearity situation. (2) variables 2 and 6 form yet another degrading collinearity situation
Section 8.3 Solutions to the Collinearity Problem
Solutions to the Collinearity Problem Realize that the solutions to this ubiquitous problem are non-unique Based on the Belsley-Kuh-Welsch (BKW) diagnostics, we may uncover alternative solutions to understanding the nature of the collinear relationships among the explanatory variables. That is, we may drop or combine right-hand side variables to mitigate collinearity situations. While straight forward, with this solution, there is a loss of information in the original structural specification. The structural integrity of the model is called into question. Use of Ridge Regression Bias-variance tradeoffs with the use of ridge regression; the ridge regression estimates are biased but they typically provide a noteworthy reduction in the variance of estimated parameters. 23
Ridge Regression Hoerl and Kennard (1970) T T R set T T Y X ki X X d dl d k X Y X Y β β β β β β ) ( ˆ 0 / 0 k ), ( ) ( ) ( L Minimize 1 + = = > = 24 [ ] [ ] [ ] OLS T T R T T T R T T R X X ki X X ki X X X X ki X X VAR X X ki X X E β β σ β β β ˆ ) ( ˆ ) ( ) ( ˆ ˆ 1 1 1 2 1 + = + + = + = There always exists a positive number k such that [ ] [ ] 2 R (bias) var iance MSE ˆ OLS MSE ˆ MSE + = β β continued...
k is called the biasing parameter ; the greater the value k, the greater the bias. Consequently, it is desirable to keep the value of k as small as possible. Another issue with ridge regression is the choice of k. Hoerl and Kennard recommend a ridge trace, a plot of the estimated coefficients as a function of k. The biasing parameter is chosen where the plot first begins to stabilize. A myriad of statistical studies exist regarding how to choose k. Preference for a ridge tableau, wherein both the parameter estimates and standard errors are reported by various values of k. 25
Example of a Ridge Tableau ) ) ) ) ) ) 26 Note that the estimated standard errors decrease with increase in the biasing parameter k.
Ridge Trace of Estimates for California Central Valley Region The study region chosen for this application was the irrigated central California valley. County-level data were taken from the 1974 Census of Agriculture. A Cobb-Douglas production function was hypothesized where output was value of crops harvested ($/county), labor expenditures ($/county), value of machinery ($/county), quantity of irrigation water applied (acrefeet/county), energy expenditures ($/county), and miscellaneous expenditures ($/county). Not surprisingly, the OLS estimates reveal the usual problems (unexpected signs, large standard errors) associated with a collinear data set. 27 continued...
28
Collinearity Diagnostics with Proc Reg COLLIN COLLINOINT (no intercept) TOL VIF COLLIN option in the MODEL statement X T X is scaled to read as correlations 1 TOL = VIF RIDGE option in the MODEL statement 29
Section 8.4 EXAMPLES
Example 1 Norton, G.W. J.D. Coffey, and E.B. Frye, Estimating Returns to Agricultural Research, Extension, and Teaching at the State Level, Southern Journal of Agricultural Economics, (July 1984): 121-128 How many potentially degrading collinear situations exist? Which variables are involved in collinearity situations? 31
OLS and Selected Ridge Regression (RR) Estimates, Standard Errors and Variance Inflation Factors for the Production Function Example RHS Variables K = 0 (OLS) K = 0.01 (RR) K = 0.02 (RR) K = 0.03 (RR) Intercept 1.793 a -2.3962 (3.997 b) (1.282) -2.700 (0.811) -2.794 (0.599) Expenses 0.915 0.380 0.286 0.243 (0.214) (0.076) (0.051) (0.039) 190.94 c Capital 0.035 (0.193) 198.46 0.090 0.100 0.102 (0.071) (0.047) (0.035) Labor 0.379 0.158 0.108 0.082 (0.152) (0.097) (0.081) (0.073) 23.19 Land 0.017 0.391 0.361 0.340 (0.537) (0.173) (0.113) (0.085) 230.32 Rain 0.022 0.021 0.021 0.020 (0.007) (0.007) (0.007) (0.007) 1.43 a Parameter estimate b Estimate of standard error c Variance Inflation Factor 32
OLS and Selected Ridge Regression (RR) Estimates, Standard Errors and Variance Inflation Factors for the Production Function Example RHS Variables K = 0 (OLS) K = 0.01 (RR) K = 0.02 (RR) K = 0.03 (RR) Research 0.00098 0.00078 0.00068 0.00065 (0.00046) (0.00034) (0.00029) (0.00027) 111.58 Extension -0.00259-0.00167 0.00043 0.00106 (0.0034) (0.0014) (0.0011) (0.0009) 2190.05 Teaching 0.00033 0.00035 0.00051 0.00057 (0.00090) (0.00074) (0.00062) (0.00055) 2082.13 R 2 0.9907 R 2 0.9873 DW 2.668 choice of k? 33
34
Dependent Variable: output Number of Observations Read 31 Number of Observations Used 31 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 8 3.66782 0.45848 292.34 <.0001 Error 22 0.03450 0.00157 Corrected Total 30 3.70232 Root MSE 0.03960 R-Square 0.9907 Dependent Mean -5.15277 Adj R-Sq 0.9873 Coeff Var -0.76855 35 continued...
Parameter Estimates VIF s Parameter Standard Variance Variable DF Estimate Error t Value Pr > t Inflation Factor Intercept 1 1.79272 3.99666 0.45 0.6581 0 expenses 1 0.91523 0.21413 4.27 0.0003 190.94239 capital 1 0.03523 0.19296 0.18 0.8568 198.46058 labor 1 0.37947 0.15232 2.49 0.0208 23.19459 land 1 0.01717 0.53691 0.03 0.9748 230.32221 rain 1 0.02193 0.00702 3.12 0.0049 1.43321 research 1 0.00098410 0.00046096 2.13 0.0442 111.57799 extension 1-0.00259 0.00338-0.77 0.4522 2190.05447 teaching 1 0.00032776 0.00088125 0.37 0.7135 2082.12628 36 continued...
Collinearity Diagnostics Condition ----------------Proportion of Variation-------------- Number Eigenvalue Index Intercept expenses capital labor 1 7.72727 1.00000 5.287254E-8 5.859172E-7 4.278851E-7 0.00012142 2 0.93088 2.88116 3.092522E-9 7.026601E-8 5.343723E-8 0.00009794 3 0.30448 5.03773 4.626315E-8 6.634278E-7 2.85779E-7 0.01417 4 0.03719 14.41411 0.00000225 0.00006249 0.00003215 0.09049 5 0.00010695 268.79766 0.00185 0.02643 0.02883 0.58907 6 0.00004999 393.14511 0.00002350 0.36777 0.00282 0.02866 7 0.00002843 521.38946 0.00090764 0.21606 0.61911 0.00005683 8 0.00000244 1778.20200 0.99532 0.26556 0.00011949 0.16334 9 5.890935E-7 3621.76940 0.00190 0.12412 0.34908 0.11399 37 continued...
Collinearity Diagnostics -----------------------Proportion of Variation--------------------- Number land rain research extension teaching 1 1.96797E-7 0.00091787 0.00001779 2.801124E-8 1.508887E-8 2 5.597224E-9 0.68567 5.545047E-7 2.764641E-9 1.296649E-9 3 9.724858E-7 0.01675 0.00190 9.138614E-9 3.08534E-10 4 0.00000463 0.00111 0.01313 0.00000179 8.398391E-7 5 0.07079 0.03623 0.46279 0.00061062 0.00010954 6 0.00284 0.01707 0.03426 0.01114 0.00228 7 0.00812 0.10103 0.01730 0.00552 0.00058228 8 0.74092 0.05228 0.11957 0.01711 0.01316 9 0.17732 0.08895 0.35103 0.96561 0.98386 Condition Indices and Variance Proportions 38 continued...
The REG Procedure Dependent Variable: output Durbin-Watson D 2.668 Number of Observations 31 1st Order Autocorrelation -0.343 Dependent Variable: output Number of Observations Read 31 Number of Observations Used 31 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 8 3.66782 0.45848 292.34 <.0001 Error 22 0.03450 0.00157 Corrected Total 30 3.70232 39 Root MSE 0.03960 R-Square 0.9907 Dependent Mean -5.15277 Adj R-Sq 0.9873 Coeff Var -0.76855 continued...
Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept 1 1.79272 3.99666 0.45 0.6581 expenses 1 0.91523 0.21413 4.27 0.0003 capital 1 0.03523 0.19296 0.18 0.8568 labor 1 0.37947 0.15232 2.49 0.0208 land 1 0.01717 0.53691 0.03 0.9748 rain 1 0.02193 0.00702 3.12 0.0049 research 1 0.00098410 0.00046096 2.13 0.0442 extension 1-0.00259 0.00338-0.77 0.4522 teaching 1 0.00032776 0.00088125 0.37 0.7135 OLS parameter estimates 40 continued...
Durbin-Watson D 2.668 Number of Observations 31 1st Order Autocorrelation -0.343 Obs _MODEL TYPE DEPVAR RIDGE PCOMIT RMSE_ Intercept expenses 1 MODEL1 PARMS output.. 0.039602 1.79272 0.915 2 MODEL1 SEB output.. 0.039602 3.99666 0.214 3 MODEL1 RIDGEVIF output 0.000... 190.942 4 MODEL1 RIDGE output 0.000. 0.039602 1.79272 0.915 5 MODEL1 RIDGESEB output 0.000. 0.039602 3.99666 0.214 6 MODEL1 RIDGEVIF output 0.005... 37.275 7 MODEL1 RIDGE output 0.005. 0.043328-1.82980 0.496 8 MODEL1 RIDGESEB output 0.005. 0.043328 1.85868 0.104 9 MODEL1 RIDGEVIF output 0.010... 17.746 10 MODEL1 RIDGE output 0.010. 0.045866-2.39620 0.380 11 MODEL1 RIDGESEB output 0.010. 0.045866 1.28183 0.076 12 MODEL1 RIDGEVIF output 0.015... 10.601 13 MODEL1 RIDGE output 0.015. 0.047478-2.60024 0.321 14 MODEL1 RIDGESEB output 0.015. 0.047478 0.99021 0.060 15 MODEL1 RIDGEVIF output 0.020... 7.114 16 MODEL1 RIDGE output 0.020. 0.048591-2.69977 0.286 17 MODEL1 RIDGESEB output 0.020. 0.048591 0.81061 0.051 41 Parameter estimates from Ridge Regression continued...
Obs _MODEL TYPE DEPVAR RIDGE PCOMIT RMSE_ Intercept expenses 18 MODEL1 RIDGEVIF output 0.025... 5.133 19 MODEL1 RIDGE output 0.025. 0.049414-2.75706 0.261 20 MODEL1 RIDGESEB output 0.025. 0.049414 0.68803 0.044 21 MODEL1 RIDGEVIF output 0.030... 3.894 22 MODEL1 RIDGE output 0.030. 0.050055-2.79370 0.243 23 MODEL1 RIDGESEB output 0.030. 0.050055 0.59881 0.039 24 MODEL1 RIDGEVIF output 0.035... 3.066 25 MODEL1 RIDGE output 0.035. 0.050576-2.81890 0.22995 26 MODEL1 RIDGESEB output 0.035. 0.050576 0.53089 0.03465 27 MODEL1 RIDGEVIF output 0.040... 2.48265 28 MODEL1 RIDGE output 0.040. 0.051014-2.83718 0.21928 29 MODEL1 RIDGESEB output 0.040. 0.051014 0.47745 0.03145 30 MODEL1 RIDGEVIF output 0.045... 2.05661 31 MODEL1 RIDGE output 0.045. 0.051393-2.85101 0.21061 32 MODEL1 RIDGESEB output 0.045. 0.051393 0.43432 0.02884 33 MODEL1 RIDGEVIF output 0.050... 1.73540 34 MODEL1 RIDGE output 0.050. 0.051728-2.86181 0.20342 35 MODEL1 RIDGESEB output 0.050. 0.051728 0.39879 0.02667 36 MODEL1 RIDGEVIF output 0.055... 1.48699 37 MODEL1 RIDGE output 0.055. 0.052031-2.87050 0.19734 38 MODEL1 RIDGESEB output 0.055. 0.052031 0.36903 0.02483 39 MODEL1 RIDGEVIF output 0.060... 1.29077 40 MODEL1 RIDGE output 0.060. 0.052308-2.87763 0.19212 42 continued...
41 MODEL1 RIDGESEB output 0.060. 0.052308 0.34376 0.02326 42 MODEL1 RIDGEVIF output 0.065... 1.13296 43 MODEL1 RIDGE output 0.065. 0.052566-2.88361 0.18759 44 MODEL1 RIDGESEB output 0.065. 0.052566 0.32205 0.02189 45 MODEL1 RIDGEVIF output 0.070... 1.00404 46 MODEL1 RIDGE output 0.070. 0.052809-2.88871 0.18360 47 MODEL1 RIDGESEB output 0.070. 0.052809 0.30320 0.02071 48 MODEL1 RIDGEVIF output 0.075... 0.89731 49 MODEL1 RIDGE output 0.075. 0.053040-2.89312 0.18006 50 MODEL1 RIDGESEB output 0.075. 0.053040 0.28670 0.01966 51 MODEL1 RIDGEVIF output 0.080... 0.80788 52 MODEL1 RIDGE output 0.080. 0.053261-2.89699 0.17690 53 MODEL1 RIDGESEB output 0.080. 0.053261 0.27215 0.01873 54 MODEL1 RIDGEVIF output 0.085... 0.73217 55 MODEL1 RIDGE output 0.085. 0.053474-2.90044 0.17405 56 MODEL1 RIDGESEB output 0.085. 0.053474 0.25922 0.01790 57 MODEL1 RIDGEVIF output 0.090... 0.66746 58 MODEL1 RIDGE output 0.090. 0.053681-2.90353 0.17146 59 MODEL1 RIDGESEB output 0.090. 0.053681 0.24767 0.01716 60 MODEL1 RIDGEVIF output 0.095... 0.61169 61 MODEL1 RIDGE output 0.095. 0.053882-2.90634 0.16910 62 MODEL1 RIDGESEB output 0.095. 0.053882 0.23730 0.01649 63 MODEL1 RIDGEVIF output 0.100... 0.56326 64 MODEL1 RIDGE output 0.100. 0.054078-2.90891 0.16693 65 MODEL1 RIDGESEB output 0.100. 0.054078 0.22794 0.01588 43
Obs capital labor land rain research extension teaching output 1 0.035 0.3795 0.017 0.02193 0.001-0.00 0.00-1 2 0.193 0.1523 0.537 0.00702 0.000 0.00 0.00-1 3 198.461 23.1946 230.322 1.43321 111.578 2190.05 2082.13-1 4 0.035 0.3795 0.017 0.02193 0.001-0.00 0.00-1 5 0.193 0.1523 0.537 0.00702 0.000 0.00 0.00-1 6 40.263 10.2612 39.373 1.08501 26.092 19.31 20.52-1 7 0.071 0.2086 0.394 0.02175 0.000-0.00-0.00-1 8 0.095 0.1108 0.243 0.00668 0.000 0.00 0.00-1 9 19.778 6.9856 17.927 1.03106 14.831 9.56 9.54-1 10 0.090 0.1576 0.391 0.02140 0.000-0.00-0.00-1 11 0.071 0.0968 0.173 0.00690 0.000 0.00 0.00-1 12 11.784 5.3421 10.393 1.00386 9.811 5.98 5.80-1 13 0.097 0.1283 0.375 0.02105-0.000-0.00-0.00-1 14 0.056 0.0876 0.137 0.00704 0.000 0.00 0.00-1 15 7.830 4.3753 6.824 0.98544 7.088 4.14 3.96-1 16 0.100 0.1084 0.361 0.02076-0.000-0.00-0.00-1 17 0.047 0.0812 0.113 0.00714 0.000 0.00 0.00-1 18 5.584 3.7464 4.839 0.97087 5.433 3.05 2.89-1 19 0.101 0.0936 0.349 0.02051-0.000-0.00-0.00-1 20 0.040 0.0764 0.097 0.00721 0.000 0.00 0.00-1 21 4.186 3.3067 3.618 0.95831 4.346 2.35 2.21-1 22 0.102 0.0818 0.340 0.02029-0.000-0.00-0.00-1 23 0.035 0.0727 0.085 0.00726 0.000 0.00 0.00-1 24 3.256 2.9819 2.812 0.94692 3.590 1.87 1.75-1 44 continued...
Obs capital labor land rain research extension teaching output 25 0.10192 0.07209 0.33251 0.02010-0.00028-0.00042-0.00012-1 26 0.03157 0.06975 0.07576 0.00729 0.00011 0.00013 0.00003-1 27 2.60720 2.73142 2.25095 0.93631 3.04145 1.52251 1.42557-1 28 0.10202 0.06378 0.32618 0.01992-0.00030-0.00043-0.00012-1 29 0.02849 0.06734 0.06837 0.00731 0.00010 0.00011 0.00003-1 30 2.13582 2.53136 1.84466 0.92623 2.62888 1.26623 1.18382-1 31 0.10201 0.05651 0.32083 0.01976-0.00032-0.00044-0.00012-1 32 0.02598 0.06530 0.06236 0.00732 0.00009 0.00011 0.00003-1 33 1.78273 2.36694 1.54077 0.91654 2.30963 1.07070 1.00004-1 34 0.10194 0.05004 0.31624 0.01961-0.00033-0.00044-0.00013-1 35 0.02389 0.06356 0.05736 0.00733 0.00009 0.00010 0.00003-1 36 1.51139 2.22861 1.30744 0.90718 2.05662 0.91807 0.85698-1 37 0.10183 0.04420 0.31225 0.01946-0.00034-0.00045-0.00013-1 38 0.02212 0.06203 0.05315 0.00734 0.00008 0.00009 0.00002-1 39 1.29836 2.10993 1.12433 0.89807 1.85200 0.79660 0.74336-1 40 0.10170 0.03887 0.30875 0.01933-0.00035-0.00045-0.00013-1 41 0.02061 0.06068 0.04955 0.00734 0.00008 0.00009 0.00002-1 42 1.12805 2.00643 0.97796 0.88919 1.68358 0.69833 0.65160-1 43 0.10155 0.03396 0.30564 0.01920-0.00035-0.00045-0.00013-1 44 0.01931 0.05947 0.04644 0.00734 0.00008 0.00008 0.00002-1 45 0.98975 1.91493 0.85909 0.88050 1.54286 0.61769 0.57640-1 46 0.10140 0.02940 0.30286 0.01907-0.00036-0.00046-0.00013-1 47 0.01817 0.05836 0.04373 0.00734 0.00007 0.00008 0.00002-1 48 0.87590 1.83309 0.76121 0.87199 1.42370 0.55069 0.51399-1 45 continued...
Obs capital labor land rain research extension teaching output 49 0.10124 0.02513 0.30036 0.01895-0.00036-0.00046-0.00013-1 50 0.01717 0.05735 0.04134 0.00733 0.00007 0.00007 0.00002-1 51 0.78105 1.75918 0.67966 0.86364 1.32160 0.49441 0.46162-1 52 0.10108 0.02113 0.29808 0.01883-0.00037-0.00046-0.00013-1 53 0.01628 0.05642 0.03923 0.00733 0.00007 0.00007 0.00002-1 54 0.70120 1.69187 0.61097 0.85543 1.23322 0.44667 0.41722-1 55 0.10092 0.01735 0.29600 0.01871-0.00037-0.00046-0.00013-1 56 0.01549 0.05555 0.03734 0.00732 0.00007 0.00007 0.00002-1 57 0.63333 1.63012 0.55257 0.84737 1.15598 0.40584 0.37926-1 58 0.10076 0.01377 0.29409 0.01860-0.00037-0.00047-0.00013-1 59 0.01478 0.05474 0.03565 0.00732 0.00006 0.00006 0.00002-1 60 0.57517 1.57314 0.50250 0.83944 1.08792 0.37062 0.34655-1 61 0.10061 0.01037 0.29232 0.01849-0.00037-0.00047-0.00013-1 62 0.01413 0.05397 0.03412 0.00731 0.00006 0.00006 0.00002-1 63 0.52495 1.52027 0.45924 0.83164 1.02750 0.34005 0.31815-1 64 0.10045 0.00713 0.29068 0.01838-0.00038-0.00047-0.00013-1 65 0.01355 0.05325 0.03274 0.00730 0.00006 0.00006 0.00001-1 46
EXAMPLE 2 Demand for Fish Meal
Dependent Variable: lnfm Number of Observations Read 25 Number of Observations Used 25 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 7 2.09853 0.29979 17.34 <.0001 Error 17 0.29384 0.01728 Corrected Total 24 2.39236 Root MSE 0.13147 R-Square 0.8772 Dependent Mean 6.26764 Adj R-Sq 0.8266 Coeff Var 2.09761 Parameter Estimates Parameter Standard Variance Variable DF Estimate Error t Value Pr > t Inflation 48 Intercept 1 6.88920 1.68501 4.09 0.0008 0 lnpfm 1-0.84312 0.23804-3.54 0.0025 6.32576 lnpsm 1 0.39172 0.18414 2.13 0.0483 5.01391 lnpbs 1 0.61181 0.29767 2.06 0.0555 11.70794 lnpcorn 1-0.17393 0.32123-0.54 0.5952 5.83986 lnpbc 1-0.88913 0.44536-2.00 0.0622 11.72640 lnfm1 1 0.33336 0.15507 2.15 0.0463 3.98512 t 1 0.01808 0.01201 1.50 0.1507 10.85462
Collinearity Diagnostics Condition ------------------Proportion of Variation------------------- Number Eigenvalue Index Intercept lnpfm lnpsm lnpbs lnpcorn 1 7.38780 1.00000 0.00000441 0.00000904 0.00002006 0.00001622 0.00085782 2 0.37886 4.41589 0.00000557 0.00000369 0.00001380 0.00000467 0.13405 3 0.22929 5.67626 0.00003724 0.00002272 0.00002043 0.00032758 0.04680 4 0.00183 63.60184 0.00156 0.02486 0.35542 0.08324 0.00753 5 0.00112 81.33166 0.00847 0.01494 0.02616 0.16151 0.53925 6 0.00051769 119.46021 0.01197 0.16789 0.15607 0.66209 0.00145 7 0.00040635 134.83621 0.00967 0.57327 0.43614 0.06606 0.00904 8 0.00018368 200.55366 0.96829 0.21901 0.02616 0.02676 0.26102 Conclusion from these diagnostics? 49
The REG Procedure Dependent Variable: lnfm Collinearity Diagnostics ---------Proportion of Variation--------- Number lnpbc lnfm1 t 1 0.00001043 0.00001326 0.00034660 2 5.620623E-7 0.00004397 0.01404 3 0.00017413 0.00004960 0.06672 4 0.03162 0.00205 0.44054 5 0.01512 0.42444 0.37241 6 0.32995 0.07260 0.00339 7 0.45734 0.00025086 0.10238 8 0.16578 0.50055 0.00017126 Durbin-Watson D 2.080 Number of Observations 25 1st Order Autocorrelation -0.040 Conclusion from these diagnostics? 50 NOTE: No serial correlation problem from the Durbin-Watson statistic.
51
FISH MEAL EXAMPLE RHS Variable K = 0 (OLS) K = 0.05 (RR) K = 0.10 (RR) INTERCEPT 6.88920 6.75466 6.73146 (1.68501) (1.35990) (1.19065) LNPFM -0.84312-0.60201-0.48065 (0.23804) (0.17276) (0.14141) LNPSM 0.39172 (0.18414) LNPBS 0.61181 0.25735 (0.14394) 0.25034 0.18855 (0.12268) 0.11490 Use of Ridge Trace (0.29767) (0.17659) (0.13273) LNPCORN -0.17393-0.17628-0.19168 (0.32123) (0.23331) (0.19580) LNPBC -0.88913 (0.44536) LNFM1 0.33336 (0.15507) t 0.01808 (0.01201) R 2 0.8772 2 R 0.8266 DW 2.080-0.60176 (0.26337) 0.32543 (0.13131) 0.01151 (0.00726) -0.49828 (0.19792) 0.30791 (0.11724) 0.00930 (0.00554) 52 choice of k?
SAS Output concerning the Use of Ridge Regression Obs _MODEL TYPE DEPVAR RIDGE PCOMIT RMSE_ Intercept lnfm1 1 MODEL1 PARMS lnfm.. 0.13147 6.88920 0.33336 2 MODEL1 SEB lnfm.. 0.13147 1.68501 0.15507 3 MODEL1 RIDGEVIF lnfm 0.000... 3.98512 4 MODEL1 RIDGE lnfm 0.000. 0.13147 6.88920 0.33336 5 MODEL1 RIDGESEB lnfm 0.000. 0.13147 1.68501 0.15507 6 MODEL1 RIDGEVIF lnfm 0.005... 3.76771 7 MODEL1 RIDGE lnfm 0.005. 0.13171 6.86629 0.33411 8 MODEL1 RIDGESEB lnfm 0.005. 0.13171 1.62983 0.15106 9 MODEL1 RIDGEVIF lnfm 0.010... 3.57255 10 MODEL1 RIDGE lnfm 0.010. 0.13232 6.84598 0.33427 11 MODEL1 RIDGESEB lnfm 0.010. 0.13232 1.58415 0.14777 12 MODEL1 RIDGEVIF lnfm 0.015... 3.39574 13 MODEL1 RIDGE lnfm 0.015. 0.13316 6.82812 0.33398 14 MODEL1 RIDGESEB lnfm 0.015. 0.13316 1.54517 0.14498 15 MODEL1 RIDGEVIF lnfm 0.020... 3.23439 16 MODEL1 RIDGE lnfm 0.020. 0.13413 6.81251 0.33334 17 MODEL1 RIDGESEB lnfm 0.020. 0.13413 1.51104 0.14253 18 MODEL1 RIDGEVIF lnfm 0.025... 3.08630 19 MODEL1 RIDGE lnfm 0.025. 0.13519 6.79894 0.33242 20 MODEL1 RIDGESEB lnfm 0.025. 0.13519 1.48054 0.14033 21 MODEL1 RIDGEVIF lnfm 0.030... 2.94972 22 MODEL1 RIDGE lnfm 0.030. 0.13629 6.78718 0.33129 23 MODEL1 RIDGESEB lnfm 0.030. 0.13629 1.45280 0.13831 24 MODEL1 RIDGEVIF lnfm 0.035... 2.82324 53 continued...
Obs _MODEL TYPE DEPVAR RIDGE PCOMIT RMSE_ Intercept lnfm1 25 MODEL1 RIDGE lnfm 0.035. 0.13742 6.77705 0.33000 26 MODEL1 RIDGESEB lnfm 0.035. 0.13742 1.42723 0.13642 27 MODEL1 RIDGEVIF lnfm 0.040... 2.70572 28 MODEL1 RIDGE lnfm 0.040. 0.13854 6.76836 0.32857 29 MODEL1 RIDGESEB lnfm 0.040. 0.13854 1.40342 0.13464 30 MODEL1 RIDGEVIF lnfm 0.045... 2.59619 31 MODEL1 RIDGE lnfm 0.045. 0.13964 6.76094 0.32704 32 MODEL1 RIDGESEB lnfm 0.045. 0.13964 1.38105 0.13295 33 MODEL1 RIDGEVIF lnfm 0.050... 2.49385 34 MODEL1 RIDGE lnfm 0.050. 0.14073 6.75466 0.32543 35 MODEL1 RIDGESEB lnfm 0.050. 0.14073 1.35990 0.13131 36 MODEL1 RIDGEVIF lnfm 0.055... 2.39799 37 MODEL1 RIDGE lnfm 0.055. 0.14180 6.74938 0.32376 38 MODEL1 RIDGESEB lnfm 0.055. 0.14180 1.33978 0.12974 39 MODEL1 RIDGEVIF lnfm 0.060... 2.30802 40 MODEL1 RIDGE lnfm 0.060. 0.14283 6.74499 0.32205 41 MODEL1 RIDGESEB lnfm 0.060. 0.14283 1.32058 0.12821 42 MODEL1 RIDGEVIF lnfm 0.065... 2.22342 43 MODEL1 RIDGE lnfm 0.065. 0.14384 6.74137 0.32031 44 MODEL1 RIDGESEB lnfm 0.065. 0.14384 1.30219 0.12673 45 MODEL1 RIDGEVIF lnfm 0.070... 2.14373 46 MODEL1 RIDGE lnfm 0.070. 0.14481 6.73845 0.31855 47 MODEL1 RIDGESEB lnfm 0.070. 0.14481 1.28451 0.12528 48 MODEL1 RIDGEVIF lnfm 0.075... 2.06854 54 continued...
Obs _MODEL TYPE DEPVAR RIDGE PCOMIT RMSE_ Intercept lnfm1 49 MODEL1 RIDGE lnfm 0.075. 0.14576 6.73613 0.31677 50 MODEL1 RIDGESEB lnfm 0.075. 0.14576 1.26750 0.12386 51 MODEL1 RIDGEVIF lnfm 0.080... 1.99751 52 MODEL1 RIDGE lnfm 0.080. 0.14667 6.73435 0.31499 53 MODEL1 RIDGESEB lnfm 0.080. 0.14667 1.25109 0.12248 54 MODEL1 RIDGEVIF lnfm 0.085... 1.93030 55 MODEL1 RIDGE lnfm 0.085. 0.14756 6.73305 0.31321 56 MODEL1 RIDGESEB lnfm 0.085. 0.14756 1.23523 0.12113 57 MODEL1 RIDGEVIF lnfm 0.090... 1.86663 58 MODEL1 RIDGE lnfm 0.090. 0.14841 6.73216 0.31144 59 MODEL1 RIDGESEB lnfm 0.090. 0.14841 1.21989 0.11981 60 MODEL1 RIDGEVIF lnfm 0.095... 1.80625 61 MODEL1 RIDGE lnfm 0.095. 0.14924 6.73165 0.30967 62 MODEL1 RIDGESEB lnfm 0.095. 0.14924 1.20504 0.11851 63 MODEL1 RIDGEVIF lnfm 0.100... 1.74891 64 MODEL1 RIDGE lnfm 0.100. 0.15004 6.73146 0.30791 65 MODEL1 RIDGESEB lnfm 0.100. 0.15004 1.19065 0.11724 55
Obs lnpbs lnpfm lnpsm lnpbc lnpcorn t lnfm 56 1 0.6118-0.84312 0.39172-0.8891-0.17393 0.0181-1 2 0.2977 0.23804 0.18414 0.4454 0.32123 0.0120-1 3 11.7079 6.32576 5.01391 11.7264 5.83986 10.8546-1 4 0.6118-0.84312 0.39172-0.8891-0.17393 0.0181-1 5 0.2977 0.23804 0.18414 0.4454 0.32123 0.0120-1 6 9.9827 5.74526 4.64773 9.9849 5.25603 9.2789-1 7 0.5502-0.80838 0.37277-0.8397-0.17100 0.0169-1 8 0.2754 0.22727 0.17762 0.4117 0.30531 0.0111-1 9 8.6212 5.24617 4.32409 8.6137 4.77377 8.0380-1 10 0.4976-0.77695 0.35547-0.7976-0.16946 0.0159-1 11 0.2571 0.21817 0.17211 0.3842 0.29231 0.0104-1 12 7.5274 4.81323 4.03620 7.5141 4.36847 7.0421-1 13 0.4522-0.74835 0.33963-0.7612-0.16890 0.0151-1 14 0.2417 0.21030 0.16733 0.3611 0.28139 0.0098-1 15 6.6349 4.43475 3.77868 6.6184 4.02291 6.2298-1 16 0.4125-0.72219 0.32505-0.7296-0.16902 0.0144-1 17 0.2286 0.20334 0.16309 0.3414 0.27201 0.0093-1 18 5.8969 4.10164 3.54715 5.8788 3.72466 5.5579-1 19 0.3775-0.69816 0.31160-0.7017-0.16964 0.0137-1 20 0.2172 0.19710 0.15927 0.3243 0.26380 0.0088-1 21 5.2794 3.80671 3.33806 5.2606 3.46456 4.9951-1 22 0.3464-0.67599 0.29914-0.6771-0.17062 0.0132-1 23 0.2072 0.19143 0.15576 0.3092 0.25650 0.0084-1 24 4.7573 3.54417 3.14845 4.7385 3.23567 4.5186-1 continued...
Obs lnpbs lnpfm lnpsm lnpbc lnpcorn t lnfm 57 25 0.31858-0.65546 0.28758-0.65518-0.17183 0.01268-1 26 0.19833 0.18623 0.15252 0.29591 0.24992 0.00810-1 27 4.31170 3.30932 2.97585 4.29330 3.03267 4.11130-1 28 0.29356-0.63638 0.27681-0.63552-0.17322 0.01225-1 29 0.19035 0.18142 0.14948 0.28396 0.24393 0.00779-1 30 3.92823 3.09832 2.81821 3.91047 2.85140 3.76007-1 31 0.27092-0.61861 0.26675-0.61780-0.17471 0.01186-1 32 0.18314 0.17695 0.14664 0.27317 0.23842 0.00751-1 33 3.59572 2.90797 2.67376 3.57875 2.68854 3.45487-1 34 0.25034-0.60201 0.25735-0.60176-0.17628 0.01151-1 35 0.17659 0.17276 0.14394 0.26337 0.23331 0.00726-1 36 3.30544 2.73561 2.54102 3.28932 2.54144 3.18779-1 37 0.23154-0.58646 0.24853-0.58717-0.17789 0.01119-1 38 0.17059 0.16883 0.14138 0.25440 0.22855 0.00702-1 39 3.05043 2.57900 2.41870 3.03521 2.40793 2.95258-1 40 0.21430-0.57186 0.24025-0.57384-0.17951 0.01090-1 41 0.16507 0.16512 0.13895 0.24616 0.22409 0.00681-1 42 2.82514 2.43625 2.30570 2.81081 2.28623 2.74424-1 43 0.19843-0.55812 0.23245-0.56163-0.18113 0.01064-1 44 0.15998 0.16162 0.13662 0.23856 0.21990 0.00661-1 45 2.62507 2.30573 2.20106 2.61161 2.17486 2.55872-1 46 0.18377-0.54517 0.22510-0.55039-0.18274 0.01040-1 47 0.15526 0.15830 0.13439 0.23151 0.21593 0.00642-1 48 2.44653 2.18607 2.10395 2.43392 2.07258 2.39272-1 continued...
Obs lnpbs lnpfm lnpsm lnpbc lnpcorn t lnfm 49 0.17018-0.53294 0.21816-0.54003-0.18432 0.01018-1 50 0.15086 0.15514 0.13224 0.22495 0.21216 0.00625-1 51 2.28650 2.07608 2.01363 2.27471 1.97834 2.24352-1 52 0.15756-0.52136 0.21160-0.53043-0.18587 0.00997-1 53 0.14676 0.15213 0.13019 0.21883 0.20859 0.00609-1 54 2.14248 1.97473 1.92946 2.13145 1.89126 2.10886-1 55 0.14580-0.51039 0.20538-0.52153-0.18738 0.00979-1 56 0.14292 0.14927 0.12821 0.21311 0.20517 0.00594-1 57 2.01236 1.88111 1.85089 2.00207 1.81056 1.98686-1 58 0.13482-0.49998 0.19948-0.51324-0.18886 0.00961-1 59 0.13931 0.14653 0.12630 0.20774 0.20191 0.00580-1 60 1.89439 1.79446 1.77740 1.88478 1.73560 1.87594-1 61 0.12454-0.49008 0.19388-0.50551-0.19029 0.00945-1 62 0.13592 0.14392 0.12445 0.20268 0.19879 0.00567-1 63 1.78708 1.71409 1.70856 1.77812 1.66579 1.77476-1 64 0.11490-0.48065 0.18855-0.49828-0.19168 0.00930-1 65 0.13273 0.14141 0.12268 0.19792 0.19580 0.00554-1 58
EXAMPLE 3 Demand Function for Frosted Flakes 20 oz. Size Retailer: Publix
60 Ridge Trace for this example
61
62 OLS parameter estimates
63 Collinearity Diagnostics
64 Conclusion from the Collinearity Diagnostics?
65 SAS Output concerning the Use of Ridge Regression
66
67 SAS Output concerning the Use of Ridge Regression
68
69
70
71 SAS Output concerning the Use of Ridge Regression
72
Section 8.5 Commentary
Commentary (1) Collinearity is not a problem of existence/nonexistence. (2) Collinearity if ubiquitous and involves the data matrix X. (3) Use the diagnostics of Belsley, Kuh, and Welsch to determine the number of degrading collinear relationships among the explanatory variables. (4) Based on the aforementioned diagnostics, the solutions to this issue involve the use of ridge regression, combining similar explanatory variables, or omitting some explanatory variables from the econometric model specification. 74