CSC 4510 Machine Learning - PDF Free Download

4: Regression (con.nued) CSC 4510 Machine Learning Dr. Mary Angela Papalaskari Department of CompuBng Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ The slides in this presentabon are adapted from: The Stanford online ML course hmp://www.ml class.org/ 1

Last Bme IntroducBon to linear regression IntuiBon least squares approximabon IntuiBon gradient descent algorithm Hands on: Simple example using excel 2

Today How to apply gradient descent to minimize the cost funcbon for regression linear algebra refresher 3

Reminder: sample problem Housing Prices (Portland, OR) 500 400 300 Price (in 1000s of dollars) 200 100 0 0 500 1000 1500 2000 2500 3000 Size (feet 2 ) 4

Reminder: NotaBon Training set of housing prices (Portland, OR) Size in feet 2 (x) Notation: m = Number of training examples x s = input variable / features y s = output variable / target variable Price ($) in 1000's (y) 2104 460 1416 232 1534 315 852 178 5

Reminder: Learning algorithm for hypothesis funcbon h Training Set Learning Algorithm Linear Hypothesis: Size of house Univariate linear regression) h Estimate price 6

Reminder: Learning algorithm for hypothesis funcbon h Training Set Learning Algorithm Linear Hypothesis: Size of house Univariate linear regression) h Estimate price 7

Gradient descent algorithm Linear Regression Model 8

Today How to apply gradient descent to minimize the cost funcbon for regression 1. a closer look at the cost funcbon 2. applying gradient descent to find the minimum of the cost funcbon linear algebra refresher 9

Hypothesis: Parameters: Cost Function: Goal: 10

Hypothesis: Simplified θ 0 = 0 Parameters: Cost Function: Goal: 11

(for fixed θ 1 this is a function of x) (function of the parameter θ 1 ) θ 0 = 0 3 3 y 2 1 2 1 0 0 1 2 3 x 0 0.5 0 0.5 1 1.5 2 2.5 h θ (x) = x 12

(for fixed θ 1 this is a function of x) (function of the parameter θ 1 ) θ 0 = 0 3 y 2 1 0 0 1 2 3 x h θ (x) = 0.5x 13

(for fixed θ 1 this is a function of x) (function of the parameter θ 1 ) θ 0 = 0 3 y 2 1 0 0 1 2 3 x h θ (x) = 0 14

What if θ 0 0? Hypothesis: Parameters: Cost Function: Goal: 15

(for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 500 400 Price ($) in 1000 s 300 200 100 0 0 1000 2000 3000 Size in feet 2 (x) h θ (x) = 10 + 0.1x 16

(for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 18

(for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 19

(for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 20

(for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 21

Have some function Want Gradient descent algorithm outline: Start with some Keep changing to reduce until we hopefully end up at a minimum 23

Have some function Want Gradient descent algorithm 24

Have some function Want Gradient descent algorithm learning rate 25

If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge. 26

at local minimum Current value of 27

Gradient descent can converge to a local minimum, even with the learning rate α fixed. 28

Gradient descent algorithm Linear Regression Model 29

Gradient descent algorithm update and simultaneously 30

J(θ 0,θ 1 ) θ 1 θ 0 31

J(θ 0,θ 1 ) θ 1 θ 0 32

(for fixed, this is a function of x) (function of the parameters ) 34

(for fixed, this is a function of x) (function of the parameters ) 35

(for fixed, this is a function of x) (function of the parameters ) 36

(for fixed, this is a function of x) (function of the parameters ) 37

(for fixed, this is a function of x) (function of the parameters ) 38

(for fixed, this is a function of x) (function of the parameters ) 39

(for fixed, this is a function of x) (function of the parameters ) 40

(for fixed, this is a function of x) (function of the parameters ) 41

(for fixed, this is a function of x) (function of the parameters ) 42

Batch Gradient Descent Batch : Each step of gradient descent uses all the training examples. Alternative: process part of the dataset for each step of the algorithm. The slides in this presentabon are adapted from: The Stanford online ML course hmp://www.ml class.org/ 43

What s next? We are not in univariate regression anymore: Size (feet 2 ) Number of bedrooms Number of floors Age of home Price ($1000) (years) 1 2104 5 1 45 460 1 1416 3 2 40 232 1 1534 3 2 30 315 1 852 2 1 36 178 44

Matrix: Rectangular array of numbers Matrix Elements (entries of matrix), entry in the row, column. Dimension of matrix: number of rows x number of columns eg: 4 x 2 48

Vector: An n x 1 matrix. element 1-indexed vs 0-indexed: 49

Matrix Addition 50

Scalar Multiplication 51

Combination of Operands 52

Matrix-vector multiplication 53

Details: m x n matrix (m rows, n columns) n x 1 matrix (n-dimensional vector) m-dimensional vector To get, multiply s row with elements of vector, and add them up. 54

Example 55

House sizes: 56

Example 57

Details: m x n matrix (m rows, n columns) n x o matrix (n rows, o columns) m x o matrix The column of the matrix is obtained by multiplying with the column of. (for i = 1,2,,o) 58

Example 59

House sizes: Have 3 competing hypotheses: 1. 2. 3. Matrix Matrix 60

Let and be matrices. Then in general, (not commutative.) E.g. 61

Let Let Compute Compute 62

Identity Matrix Denoted (or ). Examples of identity matrices: 2 x 2 3 x 3 For any matrix, 4 x 4 63

Not all numbers have an inverse. Matrix inverse: If A is an m x m matrix, and if it has an inverse, Matrices that don t have an inverse are singular or degenerate 64

Matrix Transpose Example: Let Then be an m x n matrix, and let is an n x m matrix, and 65