4: Regression (con.nued) CSC 4510 Machine Learning Dr. Mary Angela Papalaskari Department of CompuBng Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ The slides in this presentabon are adapted from: The Stanford online ML course hmp://www.ml class.org/ 1
Last Bme IntroducBon to linear regression IntuiBon least squares approximabon IntuiBon gradient descent algorithm Hands on: Simple example using excel 2
Today How to apply gradient descent to minimize the cost funcbon for regression linear algebra refresher 3
Reminder: sample problem Housing Prices (Portland, OR) 500 400 300 Price (in 1000s of dollars) 200 100 0 0 500 1000 1500 2000 2500 3000 Size (feet 2 ) 4
Reminder: NotaBon Training set of housing prices (Portland, OR) Size in feet 2 (x) Notation: m = Number of training examples x s = input variable / features y s = output variable / target variable Price ($) in 1000's (y) 2104 460 1416 232 1534 315 852 178 5
Reminder: Learning algorithm for hypothesis funcbon h Training Set Learning Algorithm Linear Hypothesis: Size of house Univariate linear regression) h Estimate price 6
Reminder: Learning algorithm for hypothesis funcbon h Training Set Learning Algorithm Linear Hypothesis: Size of house Univariate linear regression) h Estimate price 7
Gradient descent algorithm Linear Regression Model 8
Today How to apply gradient descent to minimize the cost funcbon for regression 1. a closer look at the cost funcbon 2. applying gradient descent to find the minimum of the cost funcbon linear algebra refresher 9
Hypothesis: Parameters: Cost Function: Goal: 10
Hypothesis: Simplified θ 0 = 0 Parameters: Cost Function: Goal: 11
(for fixed θ 1 this is a function of x) (function of the parameter θ 1 ) θ 0 = 0 3 3 y 2 1 2 1 0 0 1 2 3 x 0 0.5 0 0.5 1 1.5 2 2.5 h θ (x) = x 12
(for fixed θ 1 this is a function of x) (function of the parameter θ 1 ) θ 0 = 0 3 y 2 1 0 0 1 2 3 x h θ (x) = 0.5x 13
(for fixed θ 1 this is a function of x) (function of the parameter θ 1 ) θ 0 = 0 3 y 2 1 0 0 1 2 3 x h θ (x) = 0 14
What if θ 0 0? Hypothesis: Parameters: Cost Function: Goal: 15
(for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 500 400 Price ($) in 1000 s 300 200 100 0 0 1000 2000 3000 Size in feet 2 (x) h θ (x) = 10 + 0.1x 16
17
(for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 18
(for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 19
(for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 20
(for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 21
Today How to apply gradient descent to minimize the cost funcbon for regression 1. a closer look at the cost funcbon 2. applying gradient descent to find the minimum of the cost funcbon linear algebra refresher 22
Have some function Want Gradient descent algorithm outline: Start with some Keep changing to reduce until we hopefully end up at a minimum 23
Have some function Want Gradient descent algorithm 24
Have some function Want Gradient descent algorithm learning rate 25
If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge. 26
at local minimum Current value of 27
Gradient descent can converge to a local minimum, even with the learning rate α fixed. 28
Gradient descent algorithm Linear Regression Model 29
Gradient descent algorithm update and simultaneously 30
J(θ 0,θ 1 ) θ 1 θ 0 31
J(θ 0,θ 1 ) θ 1 θ 0 32
33
(for fixed, this is a function of x) (function of the parameters ) 34
(for fixed, this is a function of x) (function of the parameters ) 35
(for fixed, this is a function of x) (function of the parameters ) 36
(for fixed, this is a function of x) (function of the parameters ) 37
(for fixed, this is a function of x) (function of the parameters ) 38
(for fixed, this is a function of x) (function of the parameters ) 39
(for fixed, this is a function of x) (function of the parameters ) 40
(for fixed, this is a function of x) (function of the parameters ) 41
(for fixed, this is a function of x) (function of the parameters ) 42
Batch Gradient Descent Batch : Each step of gradient descent uses all the training examples. Alternative: process part of the dataset for each step of the algorithm. The slides in this presentabon are adapted from: The Stanford online ML course hmp://www.ml class.org/ 43
What s next? We are not in univariate regression anymore: Size (feet 2 ) Number of bedrooms Number of floors Age of home Price ($1000) (years) 1 2104 5 1 45 460 1 1416 3 2 40 232 1 1534 3 2 30 315 1 852 2 1 36 178 44
What s next? We are not in univariate regression anymore: Size (feet 2 ) Number of bedrooms Number of floors Age of home Price ($1000) (years) 1 2104 5 1 45 460 1 1416 3 2 40 232 1 1534 3 2 30 315 1 852 2 1 36 178 45
What s next? We are not in univariate regression anymore: Size (feet 2 ) Number of bedrooms Number of floors Age of home Price ($1000) (years) 1 2104 5 1 45 460 1 1416 3 2 40 232 1 1534 3 2 30 315 1 852 2 1 36 178 46
Today How to apply gradient descent to minimize the cost funcbon for regression 1. a closer look at the cost funcbon 2. applying gradient descent to find the minimum of the cost funcbon linear algebra refresher 47
Matrix: Rectangular array of numbers Matrix Elements (entries of matrix), entry in the row, column. Dimension of matrix: number of rows x number of columns eg: 4 x 2 48
Vector: An n x 1 matrix. element 1-indexed vs 0-indexed: 49
Matrix Addition 50
Scalar Multiplication 51
Combination of Operands 52
Matrix-vector multiplication 53
Details: m x n matrix (m rows, n columns) n x 1 matrix (n-dimensional vector) m-dimensional vector To get, multiply s row with elements of vector, and add them up. 54
Example 55
House sizes: 56
Example 57
Details: m x n matrix (m rows, n columns) n x o matrix (n rows, o columns) m x o matrix The column of the matrix is obtained by multiplying with the column of. (for i = 1,2,,o) 58
Example 59
House sizes: Have 3 competing hypotheses: 1. 2. 3. Matrix Matrix 60
Let and be matrices. Then in general, (not commutative.) E.g. 61
Let Let Compute Compute 62
Identity Matrix Denoted (or ). Examples of identity matrices: 2 x 2 3 x 3 For any matrix, 4 x 4 63
Not all numbers have an inverse. Matrix inverse: If A is an m x m matrix, and if it has an inverse, Matrices that don t have an inverse are singular or degenerate 64
Matrix Transpose Example: Let Then be an m x n matrix, and let is an n x m matrix, and 65