CSC 4510 Machine Learning

Size: px

Start display at page:

Download "CSC 4510 Machine Learning"

Adela Cameron
6 years ago
Views:

1 4: Regression (con.nued) CSC 4510 Machine Learning Dr. Mary Angela Papalaskari Department of CompuBng Sciences Villanova University Course website: The slides in this presentabon are adapted from: The Stanford online ML course hmp:// class.org/ 1

2 Last Bme IntroducBon to linear regression IntuiBon least squares approximabon IntuiBon gradient descent algorithm Hands on: Simple example using excel 2

3 Today How to apply gradient descent to minimize the cost funcbon for regression linear algebra refresher 3

4 Reminder: sample problem Housing Prices (Portland, OR) Price (in 1000s of dollars) Size (feet 2 ) 4

5 Reminder: NotaBon Training set of housing prices (Portland, OR) Size in feet 2 (x) Notation: m = Number of training examples x s = input variable / features y s = output variable / target variable Price ($) in 1000's (y)

6 Reminder: Learning algorithm for hypothesis funcbon h Training Set Learning Algorithm Linear Hypothesis: Size of house Univariate linear regression) h Estimate price 6

7 Reminder: Learning algorithm for hypothesis funcbon h Training Set Learning Algorithm Linear Hypothesis: Size of house Univariate linear regression) h Estimate price 7

8 Gradient descent algorithm Linear Regression Model 8

9 Today How to apply gradient descent to minimize the cost funcbon for regression 1. a closer look at the cost funcbon 2. applying gradient descent to find the minimum of the cost funcbon linear algebra refresher 9

10 Hypothesis: Parameters: Cost Function: Goal: 10

11 Hypothesis: Simplified θ 0 = 0 Parameters: Cost Function: Goal: 11

12 (for fixed θ 1 this is a function of x) (function of the parameter θ 1 ) θ 0 = y x h θ (x) = x 12

13 (for fixed θ 1 this is a function of x) (function of the parameter θ 1 ) θ 0 = 0 3 y x h θ (x) = 0.5x 13

14 (for fixed θ 1 this is a function of x) (function of the parameter θ 1 ) θ 0 = 0 3 y x h θ (x) = 0 14

15 What if θ 0 0? Hypothesis: Parameters: Cost Function: Goal: 15

(for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 500 400

16 (for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) Price ($) in 1000 s Size in feet 2 (x) h θ (x) = x 16

17 17

18 (for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 18

19 (for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 19

20 (for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 20

21 (for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 21

22 Today How to apply gradient descent to minimize the cost funcbon for regression 1. a closer look at the cost funcbon 2. applying gradient descent to find the minimum of the cost funcbon linear algebra refresher 22

23 Have some function Want Gradient descent algorithm outline: Start with some Keep changing to reduce until we hopefully end up at a minimum 23

24 Have some function Want Gradient descent algorithm 24

25 Have some function Want Gradient descent algorithm learning rate 25

26 If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge. 26

27 at local minimum Current value of 27

28 Gradient descent can converge to a local minimum, even with the learning rate α fixed. 28

29 Gradient descent algorithm Linear Regression Model 29

30 Gradient descent algorithm update and simultaneously 30

31 J(θ 0,θ 1 ) θ 1 θ 0 31

32 J(θ 0,θ 1 ) θ 1 θ 0 32

33 33

34 (for fixed, this is a function of x) (function of the parameters ) 34

35 (for fixed, this is a function of x) (function of the parameters ) 35

36 (for fixed, this is a function of x) (function of the parameters ) 36

37 (for fixed, this is a function of x) (function of the parameters ) 37

38 (for fixed, this is a function of x) (function of the parameters ) 38

39 (for fixed, this is a function of x) (function of the parameters ) 39

40 (for fixed, this is a function of x) (function of the parameters ) 40

41 (for fixed, this is a function of x) (function of the parameters ) 41

42 (for fixed, this is a function of x) (function of the parameters ) 42

43 Batch Gradient Descent Batch : Each step of gradient descent uses all the training examples. Alternative: process part of the dataset for each step of the algorithm. The slides in this presentabon are adapted from: The Stanford online ML course hmp:// class.org/ 43

44 What s next? We are not in univariate regression anymore: Size (feet 2 ) Number of bedrooms Number of floors Age of home Price ($1000) (years)

45 What s next? We are not in univariate regression anymore: Size (feet 2 ) Number of bedrooms Number of floors Age of home Price ($1000) (years)

46 What s next? We are not in univariate regression anymore: Size (feet 2 ) Number of bedrooms Number of floors Age of home Price ($1000) (years)

47 Today How to apply gradient descent to minimize the cost funcbon for regression 1. a closer look at the cost funcbon 2. applying gradient descent to find the minimum of the cost funcbon linear algebra refresher 47

48 Matrix: Rectangular array of numbers Matrix Elements (entries of matrix), entry in the row, column. Dimension of matrix: number of rows x number of columns eg: 4 x 2 48

49 Vector: An n x 1 matrix. element 1-indexed vs 0-indexed: 49

50 Matrix Addition 50

51 Scalar Multiplication 51

52 Combination of Operands 52

53 Matrix-vector multiplication 53

54 Details: m x n matrix (m rows, n columns) n x 1 matrix (n-dimensional vector) m-dimensional vector To get, multiply s row with elements of vector, and add them up. 54

55 Example 55

56 House sizes: 56

57 Example 57

58 Details: m x n matrix (m rows, n columns) n x o matrix (n rows, o columns) m x o matrix The column of the matrix is obtained by multiplying with the column of. (for i = 1,2,,o) 58

59 Example 59

60 House sizes: Have 3 competing hypotheses: Matrix Matrix 60

61 Let and be matrices. Then in general, (not commutative.) E.g. 61

62 Let Let Compute Compute 62

63 Identity Matrix Denoted (or ). Examples of identity matrices: 2 x 2 3 x 3 For any matrix, 4 x 4 63

64 Not all numbers have an inverse. Matrix inverse: If A is an m x m matrix, and if it has an inverse, Matrices that don t have an inverse are singular or degenerate 64

65 Matrix Transpose Example: Let Then be an m x n matrix, and let is an n x m matrix, and 65

CSC 4510 Machine Learning

5: Mul'variate Regression CSC 4510 Machine Learning Dr. Mary Angela Papalaskari Department of CompuBng Sciences Villanova Course website: www.csc.villanova.edu/~map/4510/ The slides in this presentabon