Lecture 7: Linear Regression (continued)

Size: px

Start display at page:

Download "Lecture 7: Linear Regression (continued)"

Christal Lester
5 years ago
Views:

1 Lecture 7: Linear Regression (continued) Reading: Chapter 3 STATS 2: Data mining and analysis Jonathan Taylor, 10/8 Slide credits: Sergio Bacallado 1 / 14

2 Potential issues in linear regression 1. Interactions between predictors 2. Non-linear relationships 3. Correlation of error terms 4. Non-constant variance of error (heteroskedasticity). 5. Outliers 6. High leverage points 7. Collinearity 2 / 14

3 Outliers Outliers are points with very high errors. Y Residuals Studentized Residuals X Fitted Values Fitted Values 3 / 14

4 Outliers Outliers are points with very high errors. Y Residuals Studentized Residuals X Fitted Values Fitted Values While they may not affect the fit, they might affect our assessment of model quality. Possible solutions: 3 / 14

5 Outliers Outliers are points with very high errors. Y Residuals Studentized Residuals X Fitted Values Fitted Values While they may not affect the fit, they might affect our assessment of model quality. Possible solutions: If we believe an outlier is due to an error in data collection, we can remove it. 3 / 14

6 Outliers Outliers are points with very high errors. Y Residuals Studentized Residuals X Fitted Values Fitted Values While they may not affect the fit, they might affect our assessment of model quality. Possible solutions: If we believe an outlier is due to an error in data collection, we can remove it. An outlier might be evidence of a missing predictor, or the need to specify a more complex model. 3 / 14

7 High leverage points Some samples with extreme inputs have an outsized effect on ˆβ. Y X Studentized Residuals X X Leverage 4 / 14

8 High leverage points Some samples with extreme inputs have an outsized effect on ˆβ. Y X Studentized Residuals X X Leverage This can be measured with the leverage statistic or self influence: h ii = ŷ i y i = (X(X T X) 1 X T ) i,i [1/n, 1]. 4 / 14

9 High leverage points Some samples with extreme inputs have an outsized effect on ˆβ. Y X Studentized Residuals X X Leverage This can be measured with the leverage statistic or self influence: h ii = ŷ i = (X(X T X) 1 X T ) y i }{{} i,i [1/n, 1]. Hat matrix 4 / 14

10 Studentized residuals The residual ˆɛ i = y i ŷ i is an estimate for the noise ɛ i. 5 / 14

11 Studentized residuals The residual ˆɛ i = y i ŷ i is an estimate for the noise ɛ i. The standard error of ˆɛ i is σ 1 h ii. 5 / 14

12 Studentized residuals The residual ˆɛ i = y i ŷ i is an estimate for the noise ɛ i. The standard error of ˆɛ i is σ 1 h ii. A studentized residual is ˆɛ i divided by its standard error. Y X Studentized Residuals X X Leverage 5 / 14

13 Studentized residuals The residual ˆɛ i = y i ŷ i is an estimate for the noise ɛ i. The standard error of ˆɛ i is σ 1 h ii. A studentized residual is ˆɛ i divided by its standard error. When model is correct, it follows a Student-t distribution with n p 2 degrees of freedom. Y X Studentized Residuals X X Leverage 5 / 14

14 Collinearity Two predictors are collinear if one explains the other well: limit = a rating + b i.e. they contain the same information Age Rating Limit Limit 6 / 14

15 Collinearity Problem: The coefficients become unidentifiable. 7 / 14

16 Collinearity Problem: The coefficients become unidentifiable. Consider the extreme case of using two identical predictors limit: balance = β 0 + β 1 limit + β 2 limit 7 / 14

17 Collinearity Problem: The coefficients become unidentifiable. Consider the extreme case of using two identical predictors limit: balance = β 0 + β 1 limit + β 2 limit = β 0 + (β ) limit + (β 2 100) limit The fit ( ˆβ 0, ˆβ 1, ˆβ 2 ) is just as good as ( ˆβ 0, ˆβ , ˆβ 2 100). 7 / 14

18 Collinearity Problem: The coefficients become unidentifiable. Consider the extreme case of using two identical predictors limit: balance = β 0 + β 1 limit + β 2 limit = β 0 + (β ) limit + (β 2 100) limit The fit ( ˆβ 0, ˆβ 1, ˆβ 2 ) is just as good as ( ˆβ 0, ˆβ , ˆβ 2 100). βage βrating β Limit β Limit 7 / 14

19 Collinearity If 2 variables are collinear, we can easily diagnose this using their correlation. 8 / 14

20 Collinearity If 2 variables are collinear, we can easily diagnose this using their correlation. A group of q variables is multilinear if these variables contain less information than q independent variables. Pairwise correlations may not reveal multilinear variables. 8 / 14

21 Collinearity If 2 variables are collinear, we can easily diagnose this using their correlation. A group of q variables is multilinear if these variables contain less information than q independent variables. Pairwise correlations may not reveal multilinear variables. The Variance Inflation Factor (VIF) measures how necessary a variable is, or how predictable it is given the other variables: V IF ( ˆβ j ) = 1 1 R 2 X j X j, where RX 2 j X j is the R 2 statistic for Multiple Linear regression of the predictor X j onto the remaining predictors. 8 / 14

22 Comparing Linear Regression to K-nearest neighbors Linear regression: prototypical parametric method. Inference KNN regression: prototypical nonparametric method. Inference? 9 / 14

23 Comparing Linear Regression to K-nearest neighbors Linear regression: prototypical parametric method. Inference KNN regression: prototypical nonparametric method. Inference? ˆf(x) = 1 K i N K (x) y i 9 / 14

24 y x2 y x2 Comparing Linear Regression to K-nearest neighbors Linear regression: prototypical parametric method. Inference KNN regression: prototypical nonparametric method. Inference? ˆf(x) = 1 K i N K (x) y i x1 x1 K = 1 K = 9 9 / 14

25 Comparing Linear Regression to K-nearest neighbors Linear regression: prototypical parametric method. KNN regression: prototypical nonparametric method. 10 / 14

26 Comparing Linear Regression to K-nearest neighbors Linear regression: prototypical parametric method. KNN regression: prototypical nonparametric method. Long story short: 10 / 14

27 Comparing Linear Regression to K-nearest neighbors Linear regression: prototypical parametric method. KNN regression: prototypical nonparametric method. Long story short: KNN is only better when the function f is not linear. 10 / 14

28 Comparing Linear Regression to K-nearest neighbors Linear regression: prototypical parametric method. KNN regression: prototypical nonparametric method. Long story short: KNN is only better when the function f is not linear. When n is not much larger than p, even if f is nonlinear, Linear Regression can outperform KNN. 10 / 14

29 Comparing Linear Regression to K-nearest neighbors Linear regression: prototypical parametric method. KNN regression: prototypical nonparametric method. Long story short: KNN is only better when the function f is not linear. When n is not much larger than p, even if f is nonlinear, Linear Regression can outperform KNN. KNN has smaller bias, but this comes at a price of higher variance. 10 / 14

30 KNN estimates for a simulation from a linear model K = 1 K = 9 y y x x 11 / 14

31 Linear models dominate KNN y Mean Squared Error x 1/K 12 / 14

32 Increasing deviations from linearity y y x Mean Squared Error Mean Squared Error /K x 1/K 13 / 14

33 When there are more predictors than observations, Linear Regression dominates p=1 p=2 p=3 p=4 p=10 p= Mean Squared Error /K When p n, each sample has no nearest neighbors, this is known as the curse of dimensionality. 14 / 14

34 When there are more predictors than observations, Linear Regression dominates p=1 p=2 p=3 p=4 p=10 p= Mean Squared Error /K When p n, each sample has no nearest neighbors, this is known as the curse of dimensionality. The variance of KNN regression is very large. 14 / 14

Lecture 17: Smoothing splines, Local Regression, and GAMs

Lecture 17: Smoothing splines, Local Regression, and GAMs Reading: Sections 7.5-7 STATS 202: Data mining and analysis November 6, 2017 1 / 24 Cubic splines Define a set of knots ξ 1 < ξ 2 < < ξ K. We want