Detecting and Circumventing Collinearity or Ill-Conditioning Problems

Size: px
Start display at page:

Download "Detecting and Circumventing Collinearity or Ill-Conditioning Problems"

Transcription

1 Chapter 8 Detecting and Circumventing Collinearity or Ill-Conditioning Problems

2 Section 8.1 Introduction

3 Multicollinearity/Collinearity/Ill-Conditioning The terms multicollinearity, collinearity, and ill-conditioning all refer to the same phenomenon, namely near linear dependencies among the set of predetermined variables. Unlike serial correlation and heteroscedasticity, this issue does not involve the disturbance term of the econometric model. This issue does not involve the dependent or endogenous variable in the model. This problem deals exclusively with the set of explanatory variables. Because of near linear dependencies among righthand side variables, it is difficult to disentagle their separate effects. Quite simply then, the right-hand side variables display redundant information. This issue is not a statistical problem it is a data problem 3 This issue is the Rodney Dangerfield of applied econometrics

4 Collinearity Nature of the Problem Consequences Belsley, Kuh, Welsch Diagnostics Variance inflation factors Condition indices Variance-decomposition proportions Circumvention of the Problem: Ridge Regression Dropping or Combining Explanatory Variables 4

5 The Ballentine Yellow Y Blue Red Green X Brown Orange Black Z 5 Multiple regression of Y on X and Z. OLS estimators: Use of blue area to estimate and green area to estimate β X Discard information in red area βz We may graphically depict the issue of collinearity. (See Peter Kennedy, 2009)

6 Formal Definition of Collinearity k j= 0 a X 0 j j X j is the j th explanatory variable j = 0, 1, 2,, k. Near linear dependency among regressor variables Departure from orthogonality of the columns of the data matrix X (X T X) Singularity (X T X) 1 Elements explode 6

7 Case1 Orthogonal Case X T X (X X) Non-orthogonal Case T X T 1 X Case Case Key Points: (1) Sampling variances of estimated OLS coefficients increase sharply (2) Greater sampling covariances for the OLS coefficients

8 Ill-Conditioning-Multicollinearity- Collinearity Deals with specific characteristics of the data matrix X-data problem; a data problem, not a statistical problem In econometric applications, many times we may end up using regressors with high correlation. Speak in terms of severity rather than of its existence or nonexistence Effects on structural integrity of econometric models Opposite of collinear orthogonal Recall the dummy variable trap from chapter 5. The dummy variable trap is an example of perfect collinearity. 8

9 Constitutes a threat to proper specification and effective estimation of a structural relationship Larger variances (standard errors) of regression coefficients; VAR( β) = σ 2 (X T X) 1 not indistinguishable from the consequences of inadequate variability in regressors Covariances among parameter estimates often large and of the wrong sign Difficulties in interpretation Confidence regions for parameters wide Increase in type II error (Accept H 0 when H 0 false) Decrease in the power of statistical tests (e.g., t, F) 9

10 Section 8.2 Collinearity Diagnostics

11 Collinearity Diagnostics Multicollinearity refers to the presence of highly intercorrelated exogenous variables in regression models. It is not surprising that it is considered one of the most ubiquitous, significant, and difficult problems in applied econometrics often referred to by modelers as the familiar curse. Collinearity diagnostics measure how much regressors are related to other regressors and how these relationships affect the stability and variance of the regression estimates. 11

12 Signs of Collinearity Signs of collinearity in a regression analysis include: (1) Large standard errors on regression coefficients, so that estimates of the true model parameters become unstable and low t-values prevail. (2) The parameter estimates vary considerably from sample to sample. (3) Often there are drastic changes in the regression estimates after only minor data revision. (4) Conflicting conclusions are reached from the usual tests of significance (such as the wrong sign for a parameter). (5) Extreme correlations between pairs of variables. (6) Omitting a variable from the equation results in smaller regression standard errors. (7) A good fit not providing good forecasts. 12

13 We use collinearity diagnostics to: (1) produce a set of condition indices that signal the presence of one or more near dependencies among the variables. (Linear dependency, an extreme form of collinearity, occurs when there is an exact linear relationship among the variables). (2) uncover those variables that are involved in particular near dependencies and to assess the degree to which the estimated regression coefficients are degraded by the presence of the near dependencies. In practice, if one exogenous variable is highly correlated with the other explanatory variables, it is extremely unlikely that the exogenous variable in question contributes substantially to the prediction equation. 13

14 Variance Inflation Factors (VIFs) The variance inflation factor (VIF i ) for variable i is defined as follows: VIF i = 1 (1 R 2 i ) = 1 Tolerance i Tolerance = (1 R i As the squared multiple correlation of the exogenous variable with the other exogenous variables approaches unity, the corresponding VIF becomes infinite. If exogenous variables are orthogonal to each other (no correlation), the variance inflation factor is 1.0. VIFs > 10 signify potential collinearity problems. This indicator is a rule of thumb, devoid of any statistical clothing. 2 ) 14 continued...

15 VIF i thus provides us with a measure of how many times larger the variance of the i th regression coefficient is for collinear data than for orthogonal data (where each VIF is 1.0). If the VIFs are not too much larger than 1.0, collinearity is not a problem. An advantage of knowing the VIF for each variable is that it gives the analyst a tangible idea of how much the variances of the estimated coefficients are degraded by collinearity. 15

16 Condition Indices X T X = λ 1λ2... λp eigenvalues of X T X Small determinant -- some (or many) of the eigenvalues are small Belsley, Kuh, Welsch diagnostic tools Condition number of the X matrix k (X) = λmax / λmin = µ MAX / µ MIN Condition index η s = µ MAX / µ s s = µ s is the sth singular value 1,..., p 16 µ = s λ s The eigenvalues of X T X are the squares of the singular values of X.

17 η s = µ MAX / µ s s = s th condition index of the nxp data matrix X 1,..., p The condition indices are the square roots of the ratio of the largest eigenvalue to each individual eigenvalue. The largest condition index is the condition number of the X matrix. Key Points There are as many near dependencies among the columns of a data matrix X as there are high condition indices. Weak dependencies are associated with condition indices around 5 or 10. Moderate to strong relations are associated with condition indices > 30. Again, these cutoffs are rules of thumb provided by Belsley, Kuh, and Welsch based on Monte Carlo experiments. 17

18 Variance Decomposition Proportions For each right-hand side variable, we can express the variance of the estimated coefficient as a function of eigenvalues (singular values). Interest lies in the proportion of the variance associated with individual eigenvalues (singular values). A degrading collinearity problem occurs when two or more variables have variance proportions greater than 0.5 in association with a condition index 30. It is quite possible to have more than one degrading collinearity problem among the explanatory variables. 18

19 Eigenvalue (p of λ λ MAX λ them) : 2 MIN Singular (p of µ µ MAX µ them) : 2 MIN Value Condition Index( η ) µ (p of µ MAX MAX them) 1 : µ µ 2 MIN s VAR( β ) Π Π Π : p0 0 VAR( β ) Π Π Π 11 : 21 p1 1 L L L L VAR( β Π Π Π 1p 2 p : pp p ) Variance proportions must sum to 1 p i= 1 Diagnostic Procedure ij = 1, j = 1, 2,..., p. (1) µ s 30 and simultaneously (2) two or more Π sj.5 19 Source: Belsley, Kuh, Welsch. Regression Diagnostics Identifying Influential Data and Sources of Collinearity, (1980), John Wiley & Sons.

20 Examples of Collinearity Diagnostics Note that variables 1, 2, 4, 5, and 6 have VIFs that are greater than 10. Consequently, these variables are suspects in the case of collinearity 20

21 Examples of Collinearity Diagnostics 21 Examination of the condition index column reveals three potentially degrading dependency situations. The condition number of this matrix is 43, Examination of the variance proportions simultaneously, with condition indices 30 reveal that (1) the intercept and variables 1, 4, and 5 are involved in one degrading collinearity situation. (2) variables 2 and 6 form yet another degrading collinearity situation

22 Section 8.3 Solutions to the Collinearity Problem

23 Solutions to the Collinearity Problem Realize that the solutions to this ubiquitous problem are non-unique Based on the Belsley-Kuh-Welsch (BKW) diagnostics, we may uncover alternative solutions to understanding the nature of the collinear relationships among the explanatory variables. That is, we may drop or combine right-hand side variables to mitigate collinearity situations. While straight forward, with this solution, there is a loss of information in the original structural specification. The structural integrity of the model is called into question. Use of Ridge Regression Bias-variance tradeoffs with the use of ridge regression; the ridge regression estimates are biased but they typically provide a noteworthy reduction in the variance of estimated parameters. 23

24 Ridge Regression Hoerl and Kennard (1970) T T R set T T Y X ki X X d dl d k X Y X Y β β β β β β ) ( ˆ 0 / 0 k ), ( ) ( ) ( L Minimize 1 + = = > = 24 [ ] [ ] [ ] OLS T T R T T T R T T R X X ki X X ki X X X X ki X X VAR X X ki X X E β β σ β β β ˆ ) ( ˆ ) ( ) ( ˆ ˆ = + + = + = There always exists a positive number k such that [ ] [ ] 2 R (bias) var iance MSE ˆ OLS MSE ˆ MSE + = β β continued...

25 k is called the biasing parameter ; the greater the value k, the greater the bias. Consequently, it is desirable to keep the value of k as small as possible. Another issue with ridge regression is the choice of k. Hoerl and Kennard recommend a ridge trace, a plot of the estimated coefficients as a function of k. The biasing parameter is chosen where the plot first begins to stabilize. A myriad of statistical studies exist regarding how to choose k. Preference for a ridge tableau, wherein both the parameter estimates and standard errors are reported by various values of k. 25

26 Example of a Ridge Tableau ) ) ) ) ) ) 26 Note that the estimated standard errors decrease with increase in the biasing parameter k.

27 Ridge Trace of Estimates for California Central Valley Region The study region chosen for this application was the irrigated central California valley. County-level data were taken from the 1974 Census of Agriculture. A Cobb-Douglas production function was hypothesized where output was value of crops harvested ($/county), labor expenditures ($/county), value of machinery ($/county), quantity of irrigation water applied (acrefeet/county), energy expenditures ($/county), and miscellaneous expenditures ($/county). Not surprisingly, the OLS estimates reveal the usual problems (unexpected signs, large standard errors) associated with a collinear data set. 27 continued...

28 28

29 Collinearity Diagnostics with Proc Reg COLLIN COLLINOINT (no intercept) TOL VIF COLLIN option in the MODEL statement X T X is scaled to read as correlations 1 TOL = VIF RIDGE option in the MODEL statement 29

30 Section 8.4 EXAMPLES

31 Example 1 Norton, G.W. J.D. Coffey, and E.B. Frye, Estimating Returns to Agricultural Research, Extension, and Teaching at the State Level, Southern Journal of Agricultural Economics, (July 1984): How many potentially degrading collinear situations exist? Which variables are involved in collinearity situations? 31

32 OLS and Selected Ridge Regression (RR) Estimates, Standard Errors and Variance Inflation Factors for the Production Function Example RHS Variables K = 0 (OLS) K = 0.01 (RR) K = 0.02 (RR) K = 0.03 (RR) Intercept a (3.997 b) (1.282) (0.811) (0.599) Expenses (0.214) (0.076) (0.051) (0.039) c Capital (0.193) (0.071) (0.047) (0.035) Labor (0.152) (0.097) (0.081) (0.073) Land (0.537) (0.173) (0.113) (0.085) Rain (0.007) (0.007) (0.007) (0.007) 1.43 a Parameter estimate b Estimate of standard error c Variance Inflation Factor 32

33 OLS and Selected Ridge Regression (RR) Estimates, Standard Errors and Variance Inflation Factors for the Production Function Example RHS Variables K = 0 (OLS) K = 0.01 (RR) K = 0.02 (RR) K = 0.03 (RR) Research ( ) ( ) ( ) ( ) Extension (0.0034) (0.0014) (0.0011) (0.0009) Teaching ( ) ( ) ( ) ( ) R R DW choice of k? 33

34 34

35 Dependent Variable: output Number of Observations Read 31 Number of Observations Used 31 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var continued...

36 Parameter Estimates VIF s Parameter Standard Variance Variable DF Estimate Error t Value Pr > t Inflation Factor Intercept expenses capital labor land rain research extension teaching continued...

37 Collinearity Diagnostics Condition Proportion of Variation Number Eigenvalue Index Intercept expenses capital labor E E E E E E E E E E continued...

38 Collinearity Diagnostics Proportion of Variation Number land rain research extension teaching E E E E E E E E E E E Condition Indices and Variance Proportions 38 continued...

39 The REG Procedure Dependent Variable: output Durbin-Watson D Number of Observations 31 1st Order Autocorrelation Dependent Variable: output Number of Observations Read 31 Number of Observations Used 31 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var continued...

40 Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > t Intercept expenses capital labor land rain research extension teaching OLS parameter estimates 40 continued...

41 Durbin-Watson D Number of Observations 31 1st Order Autocorrelation Obs _MODEL TYPE DEPVAR RIDGE PCOMIT RMSE_ Intercept expenses 1 MODEL1 PARMS output MODEL1 SEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output Parameter estimates from Ridge Regression continued...

42 Obs _MODEL TYPE DEPVAR RIDGE PCOMIT RMSE_ Intercept expenses 18 MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output continued...

43 41 MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output MODEL1 RIDGEVIF output MODEL1 RIDGE output MODEL1 RIDGESEB output

44 Obs capital labor land rain research extension teaching output continued...

45 Obs capital labor land rain research extension teaching output continued...

46 Obs capital labor land rain research extension teaching output

47 EXAMPLE 2 Demand for Fish Meal

48 Dependent Variable: lnfm Number of Observations Read 25 Number of Observations Used 25 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model <.0001 Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variance Variable DF Estimate Error t Value Pr > t Inflation 48 Intercept lnpfm lnpsm lnpbs lnpcorn lnpbc lnfm t

49 Collinearity Diagnostics Condition Proportion of Variation Number Eigenvalue Index Intercept lnpfm lnpsm lnpbs lnpcorn Conclusion from these diagnostics? 49

50 The REG Procedure Dependent Variable: lnfm Collinearity Diagnostics Proportion of Variation Number lnpbc lnfm1 t E Durbin-Watson D Number of Observations 25 1st Order Autocorrelation Conclusion from these diagnostics? 50 NOTE: No serial correlation problem from the Durbin-Watson statistic.

51 51

52 FISH MEAL EXAMPLE RHS Variable K = 0 (OLS) K = 0.05 (RR) K = 0.10 (RR) INTERCEPT ( ) ( ) ( ) LNPFM ( ) ( ) ( ) LNPSM ( ) LNPBS ( ) ( ) Use of Ridge Trace ( ) ( ) ( ) LNPCORN ( ) ( ) ( ) LNPBC ( ) LNFM ( ) t ( ) R R DW ( ) ( ) ( ) ( ) ( ) ( ) 52 choice of k?

53 SAS Output concerning the Use of Ridge Regression Obs _MODEL TYPE DEPVAR RIDGE PCOMIT RMSE_ Intercept lnfm1 1 MODEL1 PARMS lnfm MODEL1 SEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm continued...

54 Obs _MODEL TYPE DEPVAR RIDGE PCOMIT RMSE_ Intercept lnfm1 25 MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm continued...

55 Obs _MODEL TYPE DEPVAR RIDGE PCOMIT RMSE_ Intercept lnfm1 49 MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm MODEL1 RIDGEVIF lnfm MODEL1 RIDGE lnfm MODEL1 RIDGESEB lnfm

56 Obs lnpbs lnpfm lnpsm lnpbc lnpcorn t lnfm continued...

57 Obs lnpbs lnpfm lnpsm lnpbc lnpcorn t lnfm continued...

58 Obs lnpbs lnpfm lnpsm lnpbc lnpcorn t lnfm

59 EXAMPLE 3 Demand Function for Frosted Flakes 20 oz. Size Retailer: Publix

60 60 Ridge Trace for this example

61 61

62 62 OLS parameter estimates

63 63 Collinearity Diagnostics

64 64 Conclusion from the Collinearity Diagnostics?

65 65 SAS Output concerning the Use of Ridge Regression

66 66

67 67 SAS Output concerning the Use of Ridge Regression

68 68

69 69

70 70

71 71 SAS Output concerning the Use of Ridge Regression

72 72

73 Section 8.5 Commentary

74 Commentary (1) Collinearity is not a problem of existence/nonexistence. (2) Collinearity if ubiquitous and involves the data matrix X. (3) Use the diagnostics of Belsley, Kuh, and Welsch to determine the number of degrading collinear relationships among the explanatory variables. (4) Based on the aforementioned diagnostics, the solutions to this issue involve the use of ridge regression, combining similar explanatory variables, or omitting some explanatory variables from the econometric model specification. 74

Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix

Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix Serial Correlation and Heteroscedasticity in Time series Regressions Econometric (EC3090) - Week 11 Agustín Bénétrix 1 Properties of OLS with serially correlated errors OLS still unbiased and consistent

More information

CREATING THE ANALYSIS

CREATING THE ANALYSIS Chapter 14 Multiple Regression Chapter Table of Contents CREATING THE ANALYSIS...214 ModelInformation...217 SummaryofFit...217 AnalysisofVariance...217 TypeIIITests...218 ParameterEstimates...218 Residuals-by-PredictedPlot...219

More information

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors Week 10: Autocorrelated errors This week, I have done one possible analysis and provided lots of output for you to consider. Case study: predicting body fat Body fat is an important health measure, but

More information

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses

More information

7. Collinearity and Model Selection

7. Collinearity and Model Selection Sociology 740 John Fox Lecture Notes 7. Collinearity and Model Selection Copyright 2014 by John Fox Collinearity and Model Selection 1 1. Introduction I When there is a perfect linear relationship among

More information

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem. STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,

More information

Centering and Interactions: The Training Data

Centering and Interactions: The Training Data Centering and Interactions: The Training Data A random sample of 150 technical support workers were first given a test of their technical skill and knowledge, and then randomly assigned to one of three

More information

Labor Economics with STATA. Estimating the Human Capital Model Using Artificial Data

Labor Economics with STATA. Estimating the Human Capital Model Using Artificial Data Labor Economics with STATA Liyousew G. Borga December 2, 2015 Estimating the Human Capital Model Using Artificial Data Liyou Borga Labor Economics with STATA December 2, 2015 84 / 105 Outline 1 The Human

More information

Multicollinearity and Validation CIVL 7012/8012

Multicollinearity and Validation CIVL 7012/8012 Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.

More information

Stat 5100 Handout #11.a SAS: Variations on Ordinary Least Squares

Stat 5100 Handout #11.a SAS: Variations on Ordinary Least Squares Stat 5100 Handout #11.a SAS: Variations on Ordinary Least Squares Example 1: (Weighted Least Squares) A health researcher is interested in studying the relationship between diastolic blood pressure (bp)

More information

Everything taken from (Hair, Hult et al. 2017) but some formulas taken elswere or created by Erik Mønness.

Everything taken from (Hair, Hult et al. 2017) but some formulas taken elswere or created by Erik Mønness. /Users/astacbf/Desktop/Assessing smartpls (engelsk).docx 1/8 Assessing smartpls Everything taken from (Hair, Hult et al. 017) but some formulas taken elswere or created by Erik Mønness. Run PLS algorithm,

More information

Stat 5100 Handout #19 SAS: Influential Observations and Outliers

Stat 5100 Handout #19 SAS: Influential Observations and Outliers Stat 5100 Handout #19 SAS: Influential Observations and Outliers Example: Data collected on 50 countries relevant to a cross-sectional study of a lifecycle savings hypothesis, which states that the response

More information

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions) THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

SASEG 9B Regression Assumptions

SASEG 9B Regression Assumptions SASEG 9B Regression Assumptions (Fall 2015) Sources (adapted with permission)- T. P. Cronan, Jeff Mullins, Ron Freeze, and David E. Douglas Course and Classroom Notes Enterprise Systems, Sam M. Walton

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian. Panel Data Analysis: Fixed Effects Models

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian. Panel Data Analysis: Fixed Effects Models SOCY776: Longitudinal Data Analysis Instructor: Natasha Sarkisian Panel Data Analysis: Fixed Effects Models Fixed effects models are similar to the first difference model we considered for two wave data

More information

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value. Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING

More information

Living with Collinearity in Local Regression Models

Living with Collinearity in Local Regression Models Living with Collinearity in Local Regression Models Chris Brunsdon 1, Martin Charlton 2, Paul Harris 2 1 People Space and Place, Roxby Building, University of Liverpool,L69 7ZT, UK Tel. +44 151 794 2837

More information

Two-Stage Least Squares

Two-Stage Least Squares Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes

More information

Stat 5100 Handout #14.a SAS: Logistic Regression

Stat 5100 Handout #14.a SAS: Logistic Regression Stat 5100 Handout #14.a SAS: Logistic Regression Example: (Text Table 14.3) Individuals were randomly sampled within two sectors of a city, and checked for presence of disease (here, spread by mosquitoes).

More information

Bivariate (Simple) Regression Analysis

Bivariate (Simple) Regression Analysis Revised July 2018 Bivariate (Simple) Regression Analysis This set of notes shows how to use Stata to estimate a simple (two-variable) regression equation. It assumes that you have set Stata up on your

More information

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1 Panel data set Consists of n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. total number of observations : nt Panel data have

More information

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening Variables Entered/Removed b Variables Entered GPA in other high school, test, Math test, GPA, High school math GPA a Variables Removed

More information

Analysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS

Analysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS Analysis of Panel Data Third Edition Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS Contents Preface to the ThirdEdition Preface to the Second Edition Preface to the First Edition

More information

American Marketing Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of Marketing Research.

American Marketing Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of Marketing Research. Collinearity, Power, and Interpretation of Multiple Regression Analysis Author(s): Charlotte H. Mason and William D. Perreault, Jr. Reviewed work(s): Source: Journal of Marketing Research, Vol. 28, No.

More information

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball

More information

OLS Assumptions and Goodness of Fit

OLS Assumptions and Goodness of Fit OLS Assumptions and Goodness of Fit A little warm-up Assume I am a poor free-throw shooter. To win a contest I can choose to attempt one of the two following challenges: A. Make three out of four free

More information

CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

CH5: CORR & SIMPLE LINEAR REFRESSION ======================================= STAT 430 SAS Examples SAS5 ===================== ssh xyz@glue.umd.edu, tap sas913 (old sas82), sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

More information

Stresstest procedures for feature selection algorithms

Stresstest procedures for feature selection algorithms Stresstest procedures for feature selection algorithms A. M. Katrutsa 1,2 and V. V. Strijov 1 1 Moscow Institute of Physics and Technology, Institutskiy lane 9, Dolgoprudny city, 1417, Russian Federation

More information

Study Guide. Module 1. Key Terms

Study Guide. Module 1. Key Terms Study Guide Module 1 Key Terms general linear model dummy variable multiple regression model ANOVA model ANCOVA model confounding variable squared multiple correlation adjusted squared multiple correlation

More information

Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Getting Correct Results from PROC REG Nate Derby Stakana Analytics Seattle, WA, USA SUCCESS 3/12/15 Nate Derby Getting Correct Results from PROC REG 1 / 29 Outline PROC REG 1 PROC REG 2 Nate Derby Getting

More information

PubHlth 640 Intermediate Biostatistics Unit 2 - Regression and Correlation. Simple Linear Regression Software: Stata v 10.1

PubHlth 640 Intermediate Biostatistics Unit 2 - Regression and Correlation. Simple Linear Regression Software: Stata v 10.1 PubHlth 640 Intermediate Biostatistics Unit 2 - Regression and Correlation Simple Linear Regression Software: Stata v 10.1 Emergency Calls to the New York Auto Club Source: Chatterjee, S; Handcock MS and

More information

1. INTRODUCTION. condi tioned, 2) this degrading

1. INTRODUCTION. condi tioned, 2) this degrading EVALUATING LINEAR MODEL REPRESENTATIONS OF CUBIC SPLINES USING PROC REG Hina Mehta, Thomas Capizzi Merck Sharp & Dohme Research Laboratories l, i, ) 1. INTRODUCTION Spline functions, are defined as piecewise

More information

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes. Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department

More information

SAS/STAT 13.1 User s Guide. The REG Procedure

SAS/STAT 13.1 User s Guide. The REG Procedure SAS/STAT 13.1 User s Guide The REG Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute

More information

Week 11: Interpretation plus

Week 11: Interpretation plus Week 11: Interpretation plus Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline A bit of a patchwork

More information

Chapter 7: Dual Modeling in the Presence of Constant Variance

Chapter 7: Dual Modeling in the Presence of Constant Variance Chapter 7: Dual Modeling in the Presence of Constant Variance 7.A Introduction An underlying premise of regression analysis is that a given response variable changes systematically and smoothly due to

More information

Chapter 16 The SIMLIN Procedure. Chapter Table of Contents

Chapter 16 The SIMLIN Procedure. Chapter Table of Contents Chapter 16 The SIMLIN Procedure Chapter Table of Contents OVERVIEW...943 GETTING STARTED...943 PredictionandSimulation...945 SYNTAX...946 FunctionalSummary...946 PROCSIMLINStatement...947 BYStatement...948

More information

Model Diagnostic tests

Model Diagnostic tests Model Diagnostic tests 1. Multicollinearity a) Pairwise correlation test Quick/Group stats/ correlations b) VIF Step 1. Open the EViews workfile named Fish8.wk1. (FROM DATA FILES- TSIME) Step 2. Select

More information

Variable selection is intended to select the best subset of predictors. But why bother?

Variable selection is intended to select the best subset of predictors. But why bother? Chapter 10 Variable Selection Variable selection is intended to select the best subset of predictors. But why bother? 1. We want to explain the data in the simplest way redundant predictors should be removed.

More information

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion

More information

Stat 5100 Handout #6 SAS: Linear Regression Remedial Measures

Stat 5100 Handout #6 SAS: Linear Regression Remedial Measures Stat 5100 Handout #6 SAS: Linear Regression Remedial Measures Example: Age and plasma level for 25 healthy children in a study are reported. Of interest is how plasma level depends on age. (Text Table

More information

Position Error Reduction of Kinematic Mechanisms Using Tolerance Analysis and Cost Function

Position Error Reduction of Kinematic Mechanisms Using Tolerance Analysis and Cost Function Position Error Reduction of Kinematic Mechanisms Using Tolerance Analysis and Cost Function B.Moetakef-Imani, M.Pour Department of Mechanical Engineering, Faculty of Engineering, Ferdowsi University of

More information

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Exploratory regression and model selection

Statistical Modelling for Social Scientists. Manchester University. January 20, 21 and 24, Exploratory regression and model selection Statistical Modelling for Social Scientists Manchester University January 20, 21 and 24, 2011 Graeme Hutcheson, University of Manchester Exploratory regression and model selection The lecture notes, exercises

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

Week 4: Simple Linear Regression III

Week 4: Simple Linear Regression III Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of

More information

Week 4: Simple Linear Regression II

Week 4: Simple Linear Regression II Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties

More information

Quantitative Methods in Management

Quantitative Methods in Management Quantitative Methods in Management MBA Glasgow University March 20-23, 2009 Luiz Moutinho, University of Glasgow Graeme Hutcheson, University of Manchester Exploratory Regression The lecture notes, exercises

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

Factorial ANOVA with SAS

Factorial ANOVA with SAS Factorial ANOVA with SAS /* potato305.sas */ options linesize=79 noovp formdlim='_' ; title 'Rotten potatoes'; title2 ''; proc format; value tfmt 1 = 'Cool' 2 = 'Warm'; data spud; infile 'potato2.data'

More information

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010

Statistical Models for Management. Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon. February 24 26, 2010 Statistical Models for Management Instituto Superior de Ciências do Trabalho e da Empresa (ISCTE) Lisbon February 24 26, 2010 Graeme Hutcheson, University of Manchester Exploratory regression and model

More information

Algorithms for LTS regression

Algorithms for LTS regression Algorithms for LTS regression October 26, 2009 Outline Robust regression. LTS regression. Adding row algorithm. Branch and bound algorithm (BBA). Preordering BBA. Structured problems Generalized linear

More information

Conditional and Unconditional Regression with No Measurement Error

Conditional and Unconditional Regression with No Measurement Error Conditional and with No Measurement Error /* reg2ways.sas */ %include 'readsenic.sas'; title2 ''; proc reg; title3 'Conditional Regression'; model infrisk = stay census; proc calis cov; /* Analyze the

More information

Data Management - 50%

Data Management - 50% Exam 1: SAS Big Data Preparation, Statistics, and Visual Exploration Data Management - 50% Navigate within the Data Management Studio Interface Register a new QKB Create and connect to a repository Define

More information

INTRODUCTION TO PANEL DATA ANALYSIS

INTRODUCTION TO PANEL DATA ANALYSIS INTRODUCTION TO PANEL DATA ANALYSIS USING EVIEWS FARIDAH NAJUNA MISMAN, PhD FINANCE DEPARTMENT FACULTY OF BUSINESS & MANAGEMENT UiTM JOHOR PANEL DATA WORKSHOP-23&24 MAY 2017 1 OUTLINE 1. Introduction 2.

More information

Chapter 6: Linear Model Selection and Regularization

Chapter 6: Linear Model Selection and Regularization Chapter 6: Linear Model Selection and Regularization As p (the number of predictors) comes close to or exceeds n (the sample size) standard linear regression is faced with problems. The variance of the

More information

2014 Stat-Ease, Inc. All Rights Reserved.

2014 Stat-Ease, Inc. All Rights Reserved. What s New in Design-Expert version 9 Factorial split plots (Two-Level, Multilevel, Optimal) Definitive Screening and Single Factor designs Journal Feature Design layout Graph Columns Design Evaluation

More information

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100.

Linear Model Selection and Regularization. especially usefull in high dimensions p>>100. Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records

More information

Chapter 6: Examples 6.A Introduction

Chapter 6: Examples 6.A Introduction Chapter 6: Examples 6.A Introduction In Chapter 4, several approaches to the dual model regression problem were described and Chapter 5 provided expressions enabling one to compute the MSE of the mean

More information

SYS 6021 Linear Statistical Models

SYS 6021 Linear Statistical Models SYS 6021 Linear Statistical Models Project 2 Spam Filters Jinghe Zhang Summary The spambase data and time indexed counts of spams and hams are studied to develop accurate spam filters. Static models are

More information

Bayes Estimators & Ridge Regression

Bayes Estimators & Ridge Regression Bayes Estimators & Ridge Regression Readings ISLR 6 STA 521 Duke University Merlise Clyde October 27, 2017 Model Assume that we have centered (as before) and rescaled X o (original X) so that X j = X o

More information

Cell means coding and effect coding

Cell means coding and effect coding Cell means coding and effect coding /* mathregr_3.sas */ %include 'readmath.sas'; title2 ''; /* The data step continues */ if ethnic ne 6; /* Otherwise, throw the case out */ /* Indicator dummy variables

More information

Lecture 7: Linear Regression (continued)

Lecture 7: Linear Regression (continued) Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14 Potential issues in linear regression 1. Interactions

More information

SPSS INSTRUCTION CHAPTER 9

SPSS INSTRUCTION CHAPTER 9 SPSS INSTRUCTION CHAPTER 9 Chapter 9 does no more than introduce the repeated-measures ANOVA, the MANOVA, and the ANCOVA, and discriminant analysis. But, you can likely envision how complicated it can

More information

Lecture on Modeling Tools for Clustering & Regression

Lecture on Modeling Tools for Clustering & Regression Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into

More information

STAT 5200 Handout #25. R-Square & Design Matrix in Mixed Models

STAT 5200 Handout #25. R-Square & Design Matrix in Mixed Models STAT 5200 Handout #25 R-Square & Design Matrix in Mixed Models I. R-Square in Mixed Models (with Example from Handout #20): For mixed models, the concept of R 2 is a little complicated (and neither PROC

More information

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions

More information

Simulation studies. Patrick Breheny. September 8. Monte Carlo simulation Example: Ridge vs. Lasso vs. Subset

Simulation studies. Patrick Breheny. September 8. Monte Carlo simulation Example: Ridge vs. Lasso vs. Subset Simulation studies Patrick Breheny September 8 Patrick Breheny BST 764: Applied Statistical Modeling 1/17 Introduction In statistics, we are often interested in properties of various estimation and model

More information

The problem we have now is called variable selection or perhaps model selection. There are several objectives.

The problem we have now is called variable selection or perhaps model selection. There are several objectives. STAT-UB.0103 NOTES for Wednesday 01.APR.04 One of the clues on the library data comes through the VIF values. These VIFs tell you to what extent a predictor is linearly dependent on other predictors. We

More information

STAT:5201 Applied Statistic II

STAT:5201 Applied Statistic II STAT:5201 Applied Statistic II Two-Factor Experiment (one fixed blocking factor, one fixed factor of interest) Randomized complete block design (RCBD) Primary Factor: Day length (short or long) Blocking

More information

Chapter 13 Multivariate Techniques. Chapter Table of Contents

Chapter 13 Multivariate Techniques. Chapter Table of Contents Chapter 13 Multivariate Techniques Chapter Table of Contents Introduction...279 Principal Components Analysis...280 Canonical Correlation...289 References...298 278 Chapter 13. Multivariate Techniques

More information

Industrialising Small Area Estimation at the Australian Bureau of Statistics

Industrialising Small Area Estimation at the Australian Bureau of Statistics Industrialising Small Area Estimation at the Australian Bureau of Statistics Peter Radisich Australian Bureau of Statistics Workshop on Methods in Official Statistics - March 14 2014 Outline Background

More information

LAMPIRAN. Sampel Penelitian

LAMPIRAN. Sampel Penelitian LAMPIRAN Lampiran 1 Daftar Perusahaan Sampel Penelitian No. Kode Kriteria Perusahaan 1 2 3 4 Sampel 1 ADES 1 2 AISA 2 3 ALTO 4 CEKA 5 DAVO 6 DLTA 3 7 ICBP 4 8 INDF 5 9 MLBI 6 10 MYOR 11 PSDN 7 12 ROTI

More information

Independent Variables

Independent Variables 1 Stepwise Multiple Regression Olivia Cohen Com 631, Spring 2017 Data: Film & TV Usage 2015 I. MODEL Independent Variables Demographics Item: Age Item: Income Dummied Item: Gender (Female) Digital Media

More information

A Note on Model Selection in Mixture Experiments

A Note on Model Selection in Mixture Experiments Journal of Mathematics Statistics (): 9-99, 007 ISSN 59-6 007 Science Publications Note on Model Selection in Miture Eperiments Kadri Ulaş KY University of Marmara, Departments of Mathematics, Göztepe

More information

Lasso. November 14, 2017

Lasso. November 14, 2017 Lasso November 14, 2017 Contents 1 Case Study: Least Absolute Shrinkage and Selection Operator (LASSO) 1 1.1 The Lasso Estimator.................................... 1 1.2 Computation of the Lasso Solution............................

More information

Generalized Least Squares (GLS) and Estimated Generalized Least Squares (EGLS)

Generalized Least Squares (GLS) and Estimated Generalized Least Squares (EGLS) Generalized Least Squares (GLS) and Estimated Generalized Least Squares (EGLS) Linear Model in matrix notation for the population Y = Xβ + Var ( ) = In GLS, the error covariance matrix is known In EGLS

More information

Basics of Multivariate Modelling and Data Analysis

Basics of Multivariate Modelling and Data Analysis Basics of Multivariate Modelling and Data Analysis Kurt-Erik Häggblom 9. Linear regression with latent variables 9.1 Principal component regression (PCR) 9.2 Partial least-squares regression (PLS) [ mostly

More information

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series

More information

SAS data statements and data: /*Factor A: angle Factor B: geometry Factor C: speed*/

SAS data statements and data: /*Factor A: angle Factor B: geometry Factor C: speed*/ STAT:5201 Applied Statistic II (Factorial with 3 factors as 2 3 design) Three-way ANOVA (Factorial with three factors) with replication Factor A: angle (low=0/high=1) Factor B: geometry (shape A=0/shape

More information

Lab 07: Multiple Linear Regression: Variable Selection

Lab 07: Multiple Linear Regression: Variable Selection Lab 07: Multiple Linear Regression: Variable Selection OBJECTIVES 1.Use PROC REG to fit multiple regression models. 2.Learn how to find the best reduced model. 3.Variable diagnostics and influential statistics

More information

First-level fmri modeling

First-level fmri modeling First-level fmri modeling Monday, Lecture 3 Jeanette Mumford University of Wisconsin - Madison What do we need to remember from the last lecture? What is the general structure of a t- statistic? How about

More information

Where s Waldo: Visualizing Collinearity Diagnostics

Where s Waldo: Visualizing Collinearity Diagnostics In press, The American Statistician Where s Waldo: Visualizing Collinearity Diagnostics Michael Friendly Ernest Kwan Abstract Collinearity diagnostics are widely used, but the typical tabular output used

More information

Machine Learning: An Applied Econometric Approach Online Appendix

Machine Learning: An Applied Econometric Approach Online Appendix Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail

More information

Lab Session 1. Introduction to Eviews

Lab Session 1. Introduction to Eviews Albert-Ludwigs University Freiburg Department of Empirical Economics Time Series Analysis, Summer 2009 Dr. Sevtap Kestel To see the data of m1: 1 Lab Session 1 Introduction to Eviews We introduce the basic

More information

Section 2.3: Simple Linear Regression: Predictions and Inference

Section 2.3: Simple Linear Regression: Predictions and Inference Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression Rebecca C. Steorts, Duke University STA 325, Chapter 3 ISL 1 / 49 Agenda How to extend beyond a SLR Multiple Linear Regression (MLR) Relationship Between the Response and Predictors

More information

May 24, Emil Coman 1 Yinghui Duan 2 Daren Anderson 3

May 24, Emil Coman 1 Yinghui Duan 2 Daren Anderson 3 Assessing Health Disparities in Intensive Longitudinal Data: Gender Differences in Granger Causality Between Primary Care Provider and Emergency Room Usage, Assessed with Medicaid Insurance Claims May

More information

5.5 Regression Estimation

5.5 Regression Estimation 5.5 Regression Estimation Assume a SRS of n pairs (x, y ),..., (x n, y n ) is selected from a population of N pairs of (x, y) data. The goal of regression estimation is to take advantage of a linear relationship

More information

Repeated Measures Part 4: Blood Flow data

Repeated Measures Part 4: Blood Flow data Repeated Measures Part 4: Blood Flow data /* bloodflow.sas */ options linesize=79 pagesize=100 noovp formdlim='_'; title 'Two within-subjecs factors: Blood flow data (NWK p. 1181)'; proc format; value

More information

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated

More information

Estimation and Inference by the Method of Projection Minimum Distance. Òscar Jordà Sharon Kozicki U.C. Davis Bank of Canada

Estimation and Inference by the Method of Projection Minimum Distance. Òscar Jordà Sharon Kozicki U.C. Davis Bank of Canada Estimation and Inference by the Method of Projection Minimum Distance Òscar Jordà Sharon Kozicki U.C. Davis Bank of Canada The Paper in a Nutshell: An Efficient Limited Information Method Step 1: estimate

More information

LISA: Explore JMP Capabilities in Design of Experiments. Liaosa Xu June 21, 2012

LISA: Explore JMP Capabilities in Design of Experiments. Liaosa Xu June 21, 2012 LISA: Explore JMP Capabilities in Design of Experiments Liaosa Xu June 21, 2012 Course Outline Why We Need Custom Design The General Approach JMP Examples Potential Collinearity Issues Prior Design Evaluations

More information

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects Patralekha Bhattacharya Thinkalytics The PDLREG procedure in SAS is used to fit a finite distributed lagged model to time series data

More information

Introduction to Mixed Models: Multivariate Regression

Introduction to Mixed Models: Multivariate Regression Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate

More information

Solution to Bonus Questions

Solution to Bonus Questions Solution to Bonus Questions Q2: (a) The histogram of 1000 sample means and sample variances are plotted below. Both histogram are symmetrically centered around the true lambda value 20. But the sample

More information

Non-Linearity of Scorecard Log-Odds

Non-Linearity of Scorecard Log-Odds Non-Linearity of Scorecard Log-Odds Ross McDonald, Keith Smith, Matthew Sturgess, Edward Huang Retail Decision Science, Lloyds Banking Group Edinburgh Credit Scoring Conference 6 th August 9 Lloyds Banking

More information

The perturb Package. April 11, colldiag... 1 consumption... 3 perturb... 4 reclassify Index 13

The perturb Package. April 11, colldiag... 1 consumption... 3 perturb... 4 reclassify Index 13 Title Tools for evaluating collinearity Version 2.01 Author John Hendrickx The perturb Package April 11, 2005 Description "perturb" evaluates collinearity by adding random noise to selected variables.

More information

APPLICATION OF FUZZY REGRESSION METHODOLOGY IN AGRICULTURE USING SAS

APPLICATION OF FUZZY REGRESSION METHODOLOGY IN AGRICULTURE USING SAS APPLICATION OF FUZZY REGRESSION METHODOLOGY IN AGRICULTURE USING SAS Himadri Ghosh and Savita Wadhwa I.A.S.R.I., Library Avenue, Pusa, New Delhi 110012 him_adri@iasri.res.in, savita@iasri.res.in Multiple

More information