Section 2.2: Covariance, Correlation, and Least Squares

Size: px

Start display at page:

Download "Section 2.2: Covariance, Correlation, and Least Squares"

Malcolm Riley
6 years ago
Views:

1 Section 2.2: Covariance, Correlation, and Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1

2 A Deeper Look at Least Squares Estimates Last time we saw that least squares estimates had some special properties: The fitted values Ŷ and x were very dependent The residuals Y Ŷ and x had no apparent relationship The residuals Y Ŷ had a sample mean of zero What s going on? And what exactly are the least squares estimates? We need to review sample covariance and correlation 2

3 Covariance Measure the direction and strength of the linear relationship between Y and X n i=1 Cov(Y, X) = (Y i Ȳ)(X i X) n 1 (Yi Ȳ )(Xi X) < 0 (Yi Ȳ )(Xi X) > 0 Y Ȳ (Yi Ȳ )(Xi X) > X X (Yi Ȳ )(Xi X) < 0 s y = 15.98, s x = 9.7 Cov(X, Y) = How do we interpret that? 3

4 Correlation Correlation is the standardized covariance: corr(x, Y) = cov(x, Y) s 2 x s2 y = cov(x, Y) s x s y The correlation is scale invariant and the units of measurement don t matter: It is always true that 1 corr(x, Y) 1. This gives the direction (- or +) and strength (0 1) of the linear relationship between X and Y. 4

5 Correlation corr(y, X) = cov(x, Y) s 2 x s2 y = cov(x, Y) s x s y = = X Y (Yi Ȳ )(Xi X) > 0 (Yi Ȳ )(Xi X) < 0 (Yi Ȳ )(Xi X) > 0 (Yi Ȳ )(Xi X) < 0 X Ȳ 5

6 Correlation corr = corr = corr = corr =

7 Correlation Only measures linear relationships: corr(x, Y) = 0 does not mean the variables are not related corr = 0.01 corr = Also be careful with influential observations... 7

8 The Least Squares Estimates The values for b 0 and b 1 that minimize the least squares criterion are: b 1 = r xy s y s x b 0 = Ȳ b 1 X where, X and Ȳ are the sample mean of X and Y corr(x, y) = r xy is the sample correlation s x and s y are the sample standard deviation of X and Y These are the least squares estimates of β 0 and β 1. 8

9 The Least Squares Estimates The values for b 0 and b 1 that minimize the least squares criterion are: b 1 = r xy s y s x b 0 = Ȳ b 1 X How do we interpret these? b 0 ensures the line goes through ( x, ȳ) b 1 scales the correlation to appropriate units by multiplying with s y /s x (what are the units of b 1?) 9

10 # Computing least squares estimates "by hand" y = housing$price; x = housing$size rxy = cor(y, x) sx = sd(x) sy = sd(y) ybar = mean(y) xbar = mean(x) b1 = rxy*sy/sx b0 = ybar - b1*xbar print(b0); print(b1) ## [1] ## [1]

11 # We get the same result as lm() fit = lm(price~size, data=housing) print(fit) ## ## Call: ## lm(formula = Price ~ Size, data = housing) ## ## Coefficients: ## (Intercept) Size ##

12 Properties of Least Squares Estimates Remember from the housing data, we had: corr(ŷ, x) = 1 (a perfect linear relationship) corr(e, x) = 0 (no linear relationship) mean(e) = 0 (sample average of residuals is zero) 12

13 Why? What is the intuition for the relationship between Ŷ and e and X? Lets consider some crazy alternative line: Y Crazy line: X LS line: X X 13

14 Fitted Values and Residuals This is a bad fit We are underestimating the value of small houses and overestimating the value of big houses. Crazy Residuals corr(e, x) = -0.7 mean(e) = Clearly, we have left some predictive ability on the table X 14

15 Summary: LS is the best we can do As long as the correlation between e and X is non-zero, we could always adjust our prediction rule to do better. We need to exploit all of the predictive power in the X values and put this into Ŷ, leaving no Xness in the residuals. In Summary: Y = Ŷ + e where: Ŷ is made from X ; corr(x, Ŷ) = ±1. e is unrelated to X; corr(x, e) = 0. On average, our prediction error is zero: ē = n i=1 e i = 0. 15

16 Decomposing the Variance How well does the least squares line explain variation in Y? Remember that Y = Ŷ + e Since Ŷ and e are uncorrelated, i.e. corr(ŷ, e) = 0, var(y) = var(ŷ + e) = var(ŷ) + var(e) n i=1 (Y i Ȳ) 2 n i=1 = (Ŷ i Ŷ) 2 n i=1 + (e i ē) 2 n 1 n 1 n 1 Given that ē = 0, and the sample mean of the fitted values Ŷ = Ȳ (why?) we get to write: n (Y i Ȳ) 2 = i=1 n (Ŷ i Ȳ) 2 + n i=1 i=1 e 2 i 16

17 Decomposing the Variance SSR: Variation in Y explained by the regression line. SSE: Variation in Y that is left unexplained. SSR = SST perfect fit. Be careful of similar acronyms; e.g. SSR for residual SS. 17

18 Decomposing the Variance (Y i Ȳ) = Ŷ i + e i Ȳ Decomposing the Variance = The (Ŷ i ANOVA Ȳ) + e i Table 18

19 The Coefficient of Determination R 2 The coefficient of determination, denoted by R 2, measures how well the fitted values Ŷ follow Y: R 2 = SSR SST = 1 SSE SST R 2 is the proportion of variance in Y that is explained by the regression line (in the mathematical not scientific sense): R 2 = 1 Var(e)/Var(Y) 0 < R 2 < 1 For simple linear regression, R 2 = rxy 2. Similar caveats to sample correlation apply 19

20 R 2 for the Housing Data summary(fit) ## ## Call: ## lm(formula = Price ~ Size, data = housing) ## ## Residuals: ## Min 1Q Median 3Q Max ## ## ## Coefficients: ## Estimate Std. Error t value Pr(> t ) ## (Intercept) *** ## Size e-06 *** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: on 13 degrees of freedom ## Multiple R-squared: ,Adjusted R-squared:

21 R 2 for the Housing Data summary(fit) ## ## Call: ## lm(formula = Price ~ Size, data = housing) ## ## Residuals: ## Min 1Q Median 3Q Max ## ## ## Coefficients: ## Estimate Std. Error t value Pr(> t ) ## (Intercept) *** ## Size e-06 *** ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: on 13 degrees of freedom ## Multiple R-squared: , Adjusted R-squared: ## F-statistic: 62 on 1 and 13 DF, p-value: 2.66e-06 21

22 R 2 for the Housing Data anova(fit) ## Analysis of Variance Table ## ## Response: Price ## Df Sum Sq Mean Sq F value Pr(>F) ## Size e-06 *** ## Residuals ## --- ## Signif. codes: 0 '***' '**' 0.01 '*' 0.05 '.' 0 R 2 = SSR SST = =

23 Back to Baseball Three very similar, related ways to look at a simple linear regression... with only one X variable, life is easy R 2 corr SSE OBP SLG AVG

Section 2.1: Intro to Simple Linear Regression & Least Squares

Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression: