Statistical foundations of Machine Learning INFO-F-422 TP: Linear Regression

Size: px

Start display at page:

Download "Statistical foundations of Machine Learning INFO-F-422 TP: Linear Regression"

Ezra Underwood
6 years ago
Views:

1 Statistical foundations of Machine Learning INFO-F-422 TP: Linear Regression Catharina Olsen and Gianluca Bontempi March 12,

2 1 Repetition 1.1 Estimation using the mean square error Assume to have N observation pairs (x i, y i ) generated by the following stochastic process y i = β 0 + β 1 x i + w i, where the w i are iid realisations of a random variable w with mean zero and constant variance σw. 2 The x i can be seen as fixed, the only random component in the sampleset D N is therefore contained in the y i (which are random due to the w i ). The coefficients β 0 and β 1 can be estimated using the least squares method. This method consists of taking those estimators βˆ 0 and βˆ 1 which minimize R emp = (y i ŷ i ) 2, (1) i 1 where This is equivalent to The solution is given by where { ˆ β 0, ˆ β 1 } = arg min b 0,b 1 βˆ 1 = S xy, ŷ i = ˆ β 0 + ˆ β 1 x i. (2) (y i b 0 b 1 x i ) 2. (3) βˆ 0 = ȳ βˆ 1 x, (4) x = N x N i N, ȳ = y i N, S xy = (x i x)y i, = (x i x) 2. (5) 1.2 Properties of the estimator E DN [ ˆ β 1 ] = β 1 Var[ ˆ β 1 ] = σ2 E[ ˆ β 0 ] = β 0 ( ) Var[ βˆ 0 ] = σ 2 1 N + x 2 ˆσ 2 w = N (y i ŷ i ) 2 N 2 is a non-biased estimator of σ 2 w. 2

3 1.3 Partitioning the variability The variability of the response y i can be expressed as follows that is (y i ȳ) 2 = (ŷ i ȳ) 2 + (y i ŷ i ) 2, (6) SS tot = SS mod + SS res. (7) 1.4 The F-test Goal: test if the variable y is really influenced by the variable x. This can be formulated as a hypothesis test β 1 = 0. If the test is rejected, it can be deduced that x influences y significantly. It can be shown that given a normally distributed w: if the hypothesis β 1 = 0 is true. SS mod SS res /(N 2) F 1,N 2 (8) 1.5 The t-test It can be shown that given a normally distributed w: ˆ β 1 N (β 1, σ 2 / ) (9) and βˆ 1 β 1 Sxx T N 2. (10) ˆσ This can be used for testing the following hypothesis: β 1 = β. 1.6 Confidence intervals With a probability 1 α, the true parameter β 1 lies in the interval 1.7 Variance of the response ˆ β 1 ± t α/2,n 2 ˆσ 2. (11) Let We can show that for all x: and ŷ(x) = ˆ β 0 + ˆ β 1 x. (12) E DN [ŷ(x)] = E [y] [y(x)] (13) Var[ŷ(x)] = σ 2 [ 1 N ] (x x)2 +. (14) 3

4 2 Linear regression exercises 2.1 Exercise 1 Compare with the theoretical part of this course (slides 7 and 28 of the chapter Regression Modelling ). The goal of this exercise is to investigate the link between two variables originating from medical data by studying the ventricular shortening velocity in function of blood glucose. # data preparation library(iswr) data(thuesen) I <-!is.na(thuesen[, "short.velocity"]) Y <- thuesen[i, "short.velocity"] X <- thuesen[i, "blood.glucose"] (a) Apply the mean square method by hand using equations (4) and (5) to compute the coefficients β 0 and β 1 of a linear model for our data. print(paste("beta.hat.0 = ", beta.hat.0)) ## [1] "beta.hat.0 = " print(paste("beta.hat.1 = ", beta.hat.1)) ## [1] "beta.hat.1 = " (b) Test the hypothesis β 1 = 0 using an F-test using equation (8) and the F distribution function pf followed by a t-test using equation (10) and the t distribution function pt print(paste("f-test result: F.value= ", F.value)) ## [1] "F-test result: F.value= " print(paste(" Pr[F >= F.value]= ", F.pr)) ## [1] " Pr[F >= F.value]= " print(paste("t-test result: t.value= ", t.value)) ## [1] "t-test result: t.value= " 4

5 print(paste("; Pr[ T >= t.value]= ", t.pr)) ## [1] "; Pr[ T >= t.value]= " (c) Compute the confidence interval for β 1 using equation (11) and the function qt. print(paste("confidence interval for beta1=")) ## [1] "Confidence interval for beta1=" print(paste("(", conf.interval.min, ",", conf.interval.max, ")")) ## [1] "( , )" (d) Use the function lm to obtain the same results automatically and compare these with the ones obtained earlier. (e) Visualize the data and the regression line Histogram of Y Y Y X 5

6 2.2 Exercise 2 The goal of this exercise is to experimentally study the bias and the variance of βˆ 0, βˆ 1, ˆσ and ŷ(x i ). See also the theoretical part of this course (slide 27 of the chapter Regression Modelling ). ## Fix model, data and number of iterations rm(list = ls()) X <- seq(-10, 10, by = 1) # the x_i are fixed beta0 <- -1 # y_i = -1 + x_i + Normal(0,5) beta1 <- 1 sd.w <- 5 N <- length(x) R <- 100 #00 \t\t# number of iterations for the simulation ## Initialize beta.hat.1 <- numeric(r) beta.hat.0 <- numeric(r) var.hat.w <- numeric(r) Y.hat <- array(na, c(r, N)) (a) Compute ˆβ 0, ˆβ 1 and ˆσ and plot their distributions. Distribution of beta.hat.1: beta1= 1 Distribution of beta.hat.0: beta0= 1 Distribution of var.hat.w: var w= beta.hat beta.hat var.hat.w (b) Illustrate the theorem Var[y(x)] = σ 2 ( 1 N + (x x)2 ). ## [1] "Theoretical var predic= " ## [1] "Observed = " ## [1] " " ## [1] "Theoretical var predic= " ## [1] "Observed = " ## [1] " " ## [1] "Theoretical var predic= " 6

7 ## [1] "Observed = " ## [1] " " ## [1] "Theoretical var predic= " ## [1] "Observed = " ## [1] " " ## [1] "Theoretical var predic= " ## [1] "Observed = " ## [1] " " 7

8 3 Multiple regression exercise This example is taken from the theoretical part of this course (slide 36 of the chapter Regression Modelling ). Mutiple linear dependence occurs when the variable x is a vector instead of a scalar. The goal of this exercise is to verify the theoretical results for the estimators ˆσ 2 and ˆβ obtained for the least squares method (no bias and analytical results concerning Var[ ˆβ]). ## Initialize rm(list = ls()) library(mass) # initial values for n, (sigma_w) and beta n <- 3 # number of input variables p <- n + 1 beta <- seq(2, p + 1) # beta =(2,3,...,n+2) sd.w <- 5 # generating data D_N N <- 100 # number of samples X <- array(runif(n * n, min = -20, max = 20), c(n, n)) X <- cbind(array(1, c(n, 1)), X) R <- 100 #00 # number of iterations beta.hat <- array(0, c(p, R)) var.hat.w <- numeric(r) Y.hat <- array(na, c(r, N)) (a) Compute Ŷ, ˆβ and ˆσ following the equations in the course slides 33, 35 and 37. (b) Plot the histograms for ˆσ and for each ˆβ 8

9 Distribution of var.hat.w: var w= 25 Distribution of beta.hat. 1 : beta 1 = var.hat.w beta.hat[i, ] Distribution of beta.hat. 2 : beta 2 = 3 Distribution of beta.hat. 3 : beta 3 = beta.hat[i, ] beta.hat[i, ] Distribution of beta.hat. 4 : beta 4 = beta.hat[i, ] 9

10 Session Info R version ( ), x86_64-apple-darwin9.8.0 Base packages: base, datasets, grdevices, graphics, methods, stats, utils Other packages: ISwR 2.0-6, MASS , knitr 1.1 Loaded via a namespace (and not attached): digest 0.5.2, evaluate 0.4.3, formatr 0.6, plyr 1.8, stringr 0.6.1, tools

Statistical foundations of Machine Learning INFO-F-422 TP: Prediction

Statistical foundations of Machine Learning INFO-F-422 TP: Prediction Catharina Olsen and Gianluca Bontempi March 25, 2013 1 1 Introduction: supervised learning A supervised learning problem lets us study