Detecting and Circumventing Collinearity or Ill-Conditioning Problems

Size: px

Start display at page:

Download "Detecting and Circumventing Collinearity or Ill-Conditioning Problems"

Brook Douglas
5 years ago
Views:

1 Chapter 8 Detecting and Circumventing Collinearity or Ill-Conditioning Problems

2 Section 8.1 Introduction

3 Multicollinearity/Collinearity/Ill-Conditioning The terms multicollinearity, collinearity, and ill-conditioning all refer to the same phenomenon, namely near linear dependencies among the set of predetermined variables. Unlike serial correlation and heteroscedasticity, this issue does not involve the disturbance term of the econometric model. This issue does not involve the dependent or endogenous variable in the model. This problem deals exclusively with the set of explanatory variables. Because of near linear dependencies among righthand side variables, it is difficult to disentagle their separate effects. Quite simply then, the right-hand side variables display redundant information. This issue is not a statistical problem it is a data problem 3 This issue is the Rodney Dangerfield of applied econometrics

4 Collinearity Nature of the Problem Consequences Belsley, Kuh, Welsch Diagnostics Variance inflation factors Condition indices Variance-decomposition proportions Circumvention of the Problem: Ridge Regression Dropping or Combining Explanatory Variables 4

5 The Ballentine Yellow Y Blue Red Green X Brown Orange Black Z 5 Multiple regression of Y on X and Z. OLS estimators: Use of blue area to estimate and green area to estimate β X Discard information in red area βz We may graphically depict the issue of collinearity. (See Peter Kennedy, 2009)

6 Formal Definition of Collinearity k j= 0 a X 0 j j X j is the j th explanatory variable j = 0, 1, 2,, k. Near linear dependency among regressor variables Departure from orthogonality of the columns of the data matrix X (X T X) Singularity (X T X) 1 Elements explode 6

7 Case1 Orthogonal Case X T X (X X) Non-orthogonal Case T X T 1 X Case Case Key Points: (1) Sampling variances of estimated OLS coefficients increase sharply (2) Greater sampling covariances for the OLS coefficients

8 Ill-Conditioning-Multicollinearity- Collinearity Deals with specific characteristics of the data matrix X-data problem; a data problem, not a statistical problem In econometric applications, many times we may end up using regressors with high correlation. Speak in terms of severity rather than of its existence or nonexistence Effects on structural integrity of econometric models Opposite of collinear orthogonal Recall the dummy variable trap from chapter 5. The dummy variable trap is an example of perfect collinearity. 8

9 Constitutes a threat to proper specification and effective estimation of a structural relationship Larger variances (standard errors) of regression coefficients; VAR( β) = σ 2 (X T X) 1 not indistinguishable from the consequences of inadequate variability in regressors Covariances among parameter estimates often large and of the wrong sign Difficulties in interpretation Confidence regions for parameters wide Increase in type II error (Accept H 0 when H 0 false) Decrease in the power of statistical tests (e.g., t, F) 9

10 Section 8.2 Collinearity Diagnostics

11 Collinearity Diagnostics Multicollinearity refers to the presence of highly intercorrelated exogenous variables in regression models. It is not surprising that it is considered one of the most ubiquitous, significant, and difficult problems in applied econometrics often referred to by modelers as the familiar curse. Collinearity diagnostics measure how much regressors are related to other regressors and how these relationships affect the stability and variance of the regression estimates. 11

12 Signs of Collinearity Signs of collinearity in a regression analysis include: (1) Large standard errors on regression coefficients, so that estimates of the true model parameters become unstable and low t-values prevail. (2) The parameter estimates vary considerably from sample to sample. (3) Often there are drastic changes in the regression estimates after only minor data revision. (4) Conflicting conclusions are reached from the usual tests of significance (such as the wrong sign for a parameter). (5) Extreme correlations between pairs of variables. (6) Omitting a variable from the equation results in smaller regression standard errors. (7) A good fit not providing good forecasts. 12

13 We use collinearity diagnostics to: (1) produce a set of condition indices that signal the presence of one or more near dependencies among the variables. (Linear dependency, an extreme form of collinearity, occurs when there is an exact linear relationship among the variables). (2) uncover those variables that are involved in particular near dependencies and to assess the degree to which the estimated regression coefficients are degraded by the presence of the near dependencies. In practice, if one exogenous variable is highly correlated with the other explanatory variables, it is extremely unlikely that the exogenous variable in question contributes substantially to the prediction equation. 13

14 Variance Inflation Factors (VIFs) The variance inflation factor (VIF i ) for variable i is defined as follows: VIF i = 1 (1 R 2 i ) = 1 Tolerance i Tolerance = (1 R i As the squared multiple correlation of the exogenous variable with the other exogenous variables approaches unity, the corresponding VIF becomes infinite. If exogenous variables are orthogonal to each other (no correlation), the variance inflation factor is 1.0. VIFs > 10 signify potential collinearity problems. This indicator is a rule of thumb, devoid of any statistical clothing. 2 ) 14 continued...

15 VIF i thus provides us with a measure of how many times larger the variance of the i th regression coefficient is for collinear data than for orthogonal data (where each VIF is 1.0). If the VIFs are not too much larger than 1.0, collinearity is not a problem. An advantage of knowing the VIF for each variable is that it gives the analyst a tangible idea of how much the variances of the estimated coefficients are degraded by collinearity. 15

16 Condition Indices X T X = λ 1λ2... λp eigenvalues of X T X Small determinant -- some (or many) of the eigenvalues are small Belsley, Kuh, Welsch diagnostic tools Condition number of the X matrix k (X) = λmax / λmin = µ MAX / µ MIN Condition index η s = µ MAX / µ s s = µ s is the sth singular value 1,..., p 16 µ = s λ s The eigenvalues of X T X are the squares of the singular values of X.

17 η s = µ MAX / µ s s = s th condition index of the nxp data matrix X 1,..., p The condition indices are the square roots of the ratio of the largest eigenvalue to each individual eigenvalue. The largest condition index is the condition number of the X matrix. Key Points There are as many near dependencies among the columns of a data matrix X as there are high condition indices. Weak dependencies are associated with condition indices around 5 or 10. Moderate to strong relations are associated with condition indices > 30. Again, these cutoffs are rules of thumb provided by Belsley, Kuh, and Welsch based on Monte Carlo experiments. 17

18 Variance Decomposition Proportions For each right-hand side variable, we can express the variance of the estimated coefficient as a function of eigenvalues (singular values). Interest lies in the proportion of the variance associated with individual eigenvalues (singular values). A degrading collinearity problem occurs when two or more variables have variance proportions greater than 0.5 in association with a condition index 30. It is quite possible to have more than one degrading collinearity problem among the explanatory variables. 18

19 Eigenvalue (p of λ λ MAX λ them) : 2 MIN Singular (p of µ µ MAX µ them) : 2 MIN Value Condition Index( η ) µ (p of µ MAX MAX them) 1 : µ µ 2 MIN s VAR( β ) Π Π Π : p0 0 VAR( β ) Π Π Π 11 : 21 p1 1 L L L L VAR( β Π Π Π 1p 2 p : pp p ) Variance proportions must sum to 1 p i= 1 Diagnostic Procedure ij = 1, j = 1, 2,..., p. (1) µ s 30 and simultaneously (2) two or more Π sj.5 19 Source: Belsley, Kuh, Welsch. Regression Diagnostics Identifying Influential Data and Sources of Collinearity, (1980), John Wiley & Sons.

20 Examples of Collinearity Diagnostics Note that variables 1, 2, 4, 5, and 6 have VIFs that are greater than 10. Consequently, these variables are suspects in the case of collinearity 20

21 Examples of Collinearity Diagnostics 21 Examination of the condition index column reveals three potentially degrading dependency situations. The condition number of this matrix is 43, Examination of the variance proportions simultaneously, with condition indices 30 reveal that (1) the intercept and variables 1, 4, and 5 are involved in one degrading collinearity situation. (2) variables 2 and 6 form yet another degrading collinearity situation

22 Section 8.3 Solutions to the Collinearity Problem

23 Solutions to the Collinearity Problem Realize that the solutions to this ubiquitous problem are non-unique Based on the Belsley-Kuh-Welsch (BKW) diagnostics, we may uncover alternative solutions to understanding the nature of the collinear relationships among the explanatory variables. That is, we may drop or combine right-hand side variables to mitigate collinearity situations. While straight forward, with this solution, there is a loss of information in the original structural specification. The structural integrity of the model is called into question. Use of Ridge Regression Bias-variance tradeoffs with the use of ridge regression; the ridge regression estimates are biased but they typically provide a noteworthy reduction in the variance of estimated parameters. 23

24 Ridge Regression Hoerl and Kennard (1970) T T R set T T Y X ki X X d dl d k X Y X Y β β β β β β ) ( ˆ 0 / 0 k ), ( ) ( ) ( L Minimize 1 + = = > = 24 [ ] [ ] [ ] OLS T T R T T T R T T R X X ki X X ki X X X X ki X X VAR X X ki X X E β β σ β β β ˆ ) ( ˆ ) ( ) ( ˆ ˆ = + + = + = There always exists a positive number k such that [ ] [ ] 2 R (bias) var iance MSE ˆ OLS MSE ˆ MSE + = β β continued...

25 k is called the biasing parameter ; the greater the value k, the greater the bias. Consequently, it is desirable to keep the value of k as small as possible. Another issue with ridge regression is the choice of k. Hoerl and Kennard recommend a ridge trace, a plot of the estimated coefficients as a function of k. The biasing parameter is chosen where the plot first begins to stabilize. A myriad of statistical studies exist regarding how to choose k. Preference for a ridge tableau, wherein both the parameter estimates and standard errors are reported by various values of k. 25

26 Example of a Ridge Tableau ) ) ) ) ) ) 26 Note that the estimated standard errors decrease with increase in the biasing parameter k.

27 Ridge Trace of Estimates for California Central Valley Region The study region chosen for this application was the irrigated central California valley. County-level data were taken from the 1974 Census of Agriculture. A Cobb-Douglas production function was hypothesized where output was value of crops harvested ($/county), labor expenditures ($/county), value of machinery ($/county), quantity of irrigation water applied (acrefeet/county), energy expenditures ($/county), and miscellaneous expenditures ($/county). Not surprisingly, the OLS estimates reveal the usual problems (unexpected signs, large standard errors) associated with a collinear data set. 27 continued...

28 28

29 Collinearity Diagnostics with Proc Reg COLLIN COLLINOINT (no intercept) TOL VIF COLLIN option in the MODEL statement X T X is scaled to read as correlations 1 TOL = VIF RIDGE option in the MODEL statement 29

30 Section 8.4 EXAMPLES

31 Example 1 Norton, G.W. J.D. Coffey, and E.B. Frye, Estimating Returns to Agricultural Research, Extension, and Teaching at the State Level, Southern Journal of Agricultural Economics, (July 1984): How many potentially degrading collinear situations exist? Which variables are involved in collinearity situations? 31

32 OLS and Selected Ridge Regression (RR) Estimates, Standard Errors and Variance Inflation Factors for the Production Function Example RHS Variables K = 0 (OLS) K = 0.01 (RR) K = 0.02 (RR) K = 0.03 (RR) Intercept a (3.997 b) (1.282) (0.811) (0.599) Expenses (0.214) (0.076) (0.051) (0.039) c Capital (0.193) (0.071) (0.047) (0.035) Labor (0.152) (0.097) (0.081) (0.073) Land (0.537) (0.173) (0.113) (0.085) Rain (0.007) (0.007) (0.007) (0.007) 1.43 a Parameter estimate b Estimate of standard error c Variance Inflation Factor 32

33 OLS and Selected Ridge Regression (RR) Estimates, Standard Errors and Variance Inflation Factors for the Production Function Example RHS Variables K = 0 (OLS) K = 0.01 (RR) K = 0.02 (RR) K = 0.03 (RR) Research ( ) ( ) ( ) ( ) Extension (0.0034) (0.0014) (0.0011) (0.0009) Teaching ( ) ( ) ( ) ( ) R R DW choice of k? 33

34 34

35 Dependent Variable: output Number of Observations Read 31 Number of Observations Used 31 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var continued...

36 Parameter Estimates VIF s Parameter Standard Variance Variable DF Estimate Error t Value Pr > t Inflation Factor Intercept expenses capital labor land rain research extension teaching continued...

37 Collinearity Diagnostics Condition Proportion of Variation Number Eigenvalue Index Intercept expenses capital labor E E E E E E E E E E continued...

38 Collinearity Diagnostics Proportion of Variation Number land rain research extension teaching E E E E E E E E E E E Condition Indices and Variance Proportions 38 continued...

39 The REG Procedure Dependent Variable: output Durbin-Watson D Number of Observations 31 1st Order Autocorrelation Dependent Variable: output Number of Observations Read 31 Number of Observations Used 31 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var continued...

40 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept expenses capital labor land rain research extension teaching OLS parameter estimates 40 continued...

41 Durbin-Watson D Number of Observations 31 1st Order Autocorrelation Obs _MODEL TYPE DEPVAR RIDGE PCOMIT RMSE_ Intercept expenses 1 MODEL1 PARMS output MODEL1 SEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output Parameter estimates from Ridge Regression continued...

42 Obs _MODEL TYPE DEPVAR RIDGE PCOMIT RMSE_ Intercept expenses 18 MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output continued...

43 41 MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output

44 Obs capital labor land rain research extension teaching output continued...

45 Obs capital labor land rain research extension teaching output continued...

46 Obs capital labor land rain research extension teaching output

47 EXAMPLE 2 Demand for Fish Meal

48 Dependent Variable: lnfm Number of Observations Read 25 Number of Observations Used 25 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variance Variable DF Estimate Error t Value Pr > t Inflation 48 Intercept lnpfm lnpsm lnpbs lnpcorn lnpbc lnfm t

49 Collinearity Diagnostics Condition Proportion of Variation Number Eigenvalue Index Intercept lnpfm lnpsm lnpbs lnpcorn Conclusion from these diagnostics? 49

50 The REG Procedure Dependent Variable: lnfm Collinearity Diagnostics Proportion of Variation Number lnpbc lnfm1 t E Durbin-Watson D Number of Observations 25 1st Order Autocorrelation Conclusion from these diagnostics? 50 NOTE: No serial correlation problem from the Durbin-Watson statistic.

51 51

52 FISH MEAL EXAMPLE RHS Variable K = 0 (OLS) K = 0.05 (RR) K = 0.10 (RR) INTERCEPT ( ) ( ) ( ) LNPFM ( ) ( ) ( ) LNPSM ( ) LNPBS ( ) ( ) Use of Ridge Trace ( ) ( ) ( ) LNPCORN ( ) ( ) ( ) LNPBC ( ) LNFM ( ) t ( ) R R DW ( ) ( ) ( ) ( ) ( ) ( ) 52 choice of k?

53 SAS Output concerning the Use of Ridge Regression Obs _MODEL TYPE DEPVAR RIDGE PCOMIT RMSE_ Intercept lnfm1 1 MODEL1 PARMS lnfm MODEL1 SEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm continued...

54 Obs _MODEL TYPE DEPVAR RIDGE PCOMIT RMSE_ Intercept lnfm1 25 MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm continued...

55 Obs _MODEL TYPE DEPVAR RIDGE PCOMIT RMSE_ Intercept lnfm1 49 MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm

56 Obs lnpbs lnpfm lnpsm lnpbc lnpcorn t lnfm continued...

57 Obs lnpbs lnpfm lnpsm lnpbc lnpcorn t lnfm continued...

58 Obs lnpbs lnpfm lnpsm lnpbc lnpcorn t lnfm

59 EXAMPLE 3 Demand Function for Frosted Flakes 20 oz. Size Retailer: Publix

60 60 Ridge Trace for this example

61 61

62 62 OLS parameter estimates

63 63 Collinearity Diagnostics

64 64 Conclusion from the Collinearity Diagnostics?

65 65 SAS Output concerning the Use of Ridge Regression

66 66

67 67 SAS Output concerning the Use of Ridge Regression

68 68

69 69

70 70

71 71 SAS Output concerning the Use of Ridge Regression

72 72

73 Section 8.5 Commentary

74 Commentary (1) Collinearity is not a problem of existence/nonexistence. (2) Collinearity if ubiquitous and involves the data matrix X. (3) Use the diagnostics of Belsley, Kuh, and Welsch to determine the number of degrading collinear relationships among the explanatory variables. (4) Based on the aforementioned diagnostics, the solutions to this issue involve the use of ridge regression, combining similar explanatory variables, or omitting some explanatory variables from the econometric model specification. 74

Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix

Serial Correlation and Heteroscedasticity in Time series Regressions Econometric (EC3090) - Week 11 Agustín Bénétrix 1 Properties of OLS with serially correlated errors OLS still unbiased and consistent