Linear Regression: One-Dimensional Case

Size: px

Start display at page:

Download "Linear Regression: One-Dimensional Case"

Ann Stewart
6 years ago
Views:

1 Linear Regression: One-Dimensional Case Given: a set of N input-response pairs The inputs (x) and the responses (y) are one dimensional scalars Goal: Model the relationship between x and y (CS5350/6350) Linear Models for Regression September 6, / 17

2 Linear Regression: One-Dimensional Case Let s assume the relationship between x and y is linear (CS5350/6350) Linear Models for Regression September 6, / 17

3 Linear Regression: One-Dimensional Case Let s assume the relationship between x and y is linear Linear relationship can be defined by a straight line with parameter w Equation of the straight line: y = wx (CS5350/6350) Linear Models for Regression September 6, / 17

4 Linear Regression: One-Dimensional Case The line may not fit the data exactly (CS5350/6350) Linear Models for Regression September 6, / 17

5 Linear Regression: One-Dimensional Case The line may not fit the data exactly But we can try making the line a reasonable approximation (CS5350/6350) Linear Models for Regression September 6, / 17

6 Linear Regression: One-Dimensional Case The line may not fit the data exactly But we can try making the line a reasonable approximation Error for the pair (x i, y i )pair:e i = y i wx i (CS5350/6350) Linear Models for Regression September 6, / 17

7 Linear Regression: One-Dimensional Case The line may not fit the data exactly But we can try making the line a reasonable approximation Error for the pair (x i, y i )pair:e i = y i wx i The total squared error: E = N i=1 e2 i = N i=1 (y i wx i ) 2 (CS5350/6350) Linear Models for Regression September 6, / 17

8 Linear Regression: One-Dimensional Case The line may not fit the data exactly But we can try making the line a reasonable approximation Error for the pair (x i, y i )pair:e i = y i wx i The total squared error: E = N i=1 e2 i = N i=1 (y i wx i ) 2 The best fitting line is defined by w minimizing the total error E (CS5350/6350) Linear Models for Regression September 6, / 17

9 Linear Regression: One-Dimensional Case The line may not fit the data exactly But we can try making the line a reasonable approximation Error for the pair (x i, y i )pair:e i = y i wx i The total squared error: E = N i=1 e2 i = N i=1 (y i wx i ) 2 The best fitting line is defined by w minimizing the total error E Just requires a little bit of calculus to find it (take derivative, equate to zero..) (CS5350/6350) Linear Models for Regression September 6, / 17

10 Analogy to line fitting: In higher dimensions, we will fit hyperplanes For 2-dim. inputs, linear regression fits a 2-dim. plane to the data (CS5350/6350) Linear Models for Regression September 6, / 17

11 Analogy to line fitting: In higher dimensions, we will fit hyperplanes For 2-dim. inputs, linear regression fits a 2-dim. plane to the data Many planes are possible. Which one is the best? (CS5350/6350) Linear Models for Regression September 6, / 17

12 Analogy to line fitting: In higher dimensions, we will fit hyperplanes For 2-dim. inputs, linear regression fits a 2-dim. plane to the data Many planes are possible. Which one is the best? Intuition: Choose the one which is (on average) closest to the responses Y (CS5350/6350) Linear Models for Regression September 6, / 17

13 Analogy to line fitting: In higher dimensions, we will fit hyperplanes For 2-dim. inputs, linear regression fits a 2-dim. plane to the data Many planes are possible. Which one is the best? Intuition: Choose the one which is (on average) closest to the responses Y Linear regression uses the sum-of-squared error notion of closeness (CS5350/6350) Linear Models for Regression September 6, / 17

14 Analogy to line fitting: In higher dimensions, we will fit hyperplanes For 2-dim. inputs, linear regression fits a 2-dim. plane to the data Many planes are possible. Which one is the best? Intuition: Choose the one which is (on average) closest to the responses Y Linear regression uses the sum-of-squared error notion of closeness Similar intuition carries over to higher dimensions too (CS5350/6350) Linear Models for Regression September 6, / 17

15 Analogy to line fitting: In higher dimensions, we will fit hyperplanes For 2-dim. inputs, linear regression fits a 2-dim. plane to the data Many planes are possible. Which one is the best? Intuition: Choose the one which is (on average) closest to the responses Y Linear regression uses the sum-of-squared error notion of closeness Similar intuition carries over to higher dimensions too Fitting a D-dimensional hyperplane to the data (CS5350/6350) Linear Models for Regression September 6, / 17

16 Analogy to line fitting: In higher dimensions, we will fit hyperplanes For 2-dim. inputs, linear regression fits a 2-dim. plane to the data Many planes are possible. Which one is the best? Intuition: Choose the one which is (on average) closest to the responses Y Linear regression uses the sum-of-squared error notion of closeness Similar intuition carries over to higher dimensions too Fitting a D-dimensional hyperplane to the data Hard to visualize in pictures though.. (CS5350/6350) Linear Models for Regression September 6, / 17

17 Analogy to line fitting: In higher dimensions, we will fit hyperplanes For 2-dim. inputs, linear regression fits a 2-dim. plane to the data Many planes are possible. Which one is the best? Intuition: Choose the one which is (on average) closest to the responses Y Linear regression uses the sum-of-squared error notion of closeness Similar intuition carries over to higher dimensions too Fitting a D-dimensional hyperplane to the data Hard to visualize in pictures though.. The hyperplane is defined by parameters w (a D 1 weight vector) (CS5350/6350) Linear Models for Regression September 6, / 17

18 (Formally) Given training data D = {(x 1, y 1 ),...,(x N, y N )} Inputs x i : D-dimensional vectors (R D ), responses y i : scalars (R) (CS5350/6350) Linear Models for Regression September 6, / 17

19 (Formally) Given training data D = {(x 1, y 1 ),...,(x N, y N )} Inputs x i : D-dimensional vectors (R D ), responses y i : scalars (R) The linear model: response is a linear function of the model parameters y = f (x, w) =b + M w j φ j (x) j=1 (CS5350/6350) Linear Models for Regression September 6, / 17

20 (Formally) Given training data D = {(x 1, y 1 ),...,(x N, y N )} Inputs x i : D-dimensional vectors (R D ), responses y i : scalars (R) The linear model: response is a linear function of the model parameters y = f (x, w) =b + M w j φ j (x) j=1 w j s and b are the model parameters (b is an offset) Parameters define the mapping from the inputs to responses (CS5350/6350) Linear Models for Regression September 6, / 17

21 (Formally) Given training data D = {(x 1, y 1 ),...,(x N, y N )} Inputs x i : D-dimensional vectors (R D ), responses y i : scalars (R) The linear model: response is a linear function of the model parameters y = f (x, w) =b + M w j φ j (x) j=1 w j s and b are the model parameters (b is an offset) Parameters define the mapping from the inputs to responses Each φ j is called a basis function Allows change of representation of the input x (often desired) (CS5350/6350) Linear Models for Regression September 6, / 17

22 The linear model: φ =[φ 1,...φ M ] y = b + M w j φ j (x) =b + w T φ(x) j=1 w =[w 1,...,w M ], the weight vector (to learn using the training data) (CS5350/6350) Linear Models for Regression September 6, / 17

23 The linear model: φ =[φ 1,...φ M ] y = b + M w j φ j (x) =b + w T φ(x) j=1 w =[w 1,...,w M ], the weight vector (to learn using the training data) We consider the simplest case: φ(x) = x φ j (x) isthej-th feature of the data (total D features, so M = D) (CS5350/6350) Linear Models for Regression September 6, / 17

24 The linear model: φ =[φ 1,...φ M ] y = b + M w j φ j (x) =b + w T φ(x) j=1 w =[w 1,...,w M ], the weight vector (to learn using the training data) We consider the simplest case: φ(x) = x φ j (x) isthej-th feature of the data (total D features, so M = D) The linear model becomes y = b + D w j x j = b + w T x j=1 (CS5350/6350) Linear Models for Regression September 6, / 17

25 The linear model: φ =[φ 1,...φ M ] y = b + M w j φ j (x) =b + w T φ(x) j=1 w =[w 1,...,w M ], the weight vector (to learn using the training data) We consider the simplest case: φ(x) = x φ j (x) isthej-th feature of the data (total D features, so M = D) The linear model becomes y = b + D w j x j = b + w T x j=1 Note: Nonlinear relationships between x and y can be modeled using suitably chosen φ j s (more when we cover Kernel Methods) (CS5350/6350) Linear Models for Regression September 6, / 17

26 Given training data D = {(x 1, y 1 ),...,(x N, y N )} Fit each training example (x i, y i ) using the linear model y i = b + w T x i (CS5350/6350) Linear Models for Regression September 6, / 17

27 Given training data D = {(x 1, y 1 ),...,(x N, y N )} Fit each training example (x i, y i ) using the linear model y i = b + w T x i A bit of notation abuse: write w =[b, w], write x i =[1, x i ] y i = w T x i (CS5350/6350) Linear Models for Regression September 6, / 17

Lecture 3: Convex sets

Lecture 3: Convex sets Rajat Mittal IIT Kanpur We denote the set of real numbers as R. Most of the time we will be working with space R n and its elements will be called vectors. Remember that a subspace