CSC 4510 Machine Learning

Similar documents
CSC 4510 Machine Learning

Programming Exercise 1: Linear Regression

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

REGRESSION ANALYSIS : LINEAR BY MAUAJAMA FIRDAUS & TULIKA SAHA

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

Linear Regression QuadraJc Regression Year

Machine Learning Lecture-1

CSC 411: Lecture 02: Linear Regression

x = 12 x = 12 1x = 16

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

AH Matrices.notebook November 28, 2016

Vector: A series of scalars contained in a column or row. Dimensions: How many rows and columns a vector or matrix has.

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

Gradient Descent - Problem of Hiking Down a Mountain

Linear Regression Implementation

Computational Intelligence (CS) SS15 Homework 1 Linear Regression and Logistic Regression

Rapid growth of massive datasets

3 Types of Gradient Descent Algorithms for Small & Large Data Sets

CS545 Contents IX. Inverse Kinematics. Reading Assignment for Next Class. Analytical Methods Iterative (Differential) Methods

Linear Regression & Gradient Descent

OUTLINES. Variable names in MATLAB. Matrices, Vectors and Scalar. Entering a vector Colon operator ( : ) Mathematical operations on vectors.

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Homework 5. Due: April 20, 2018 at 7:00PM

Therefore, after becoming familiar with the Matrix Method, you will be able to solve a system of two linear equations in four different ways.

Basics of Java Programming variables, assignment, and input

COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

ITI Introduction to Computing II Winter 2018

All lecture slides will be available at CSC2515_Winter15.html

A Brief Look at Optimization

PetShop (BYU Students, SIGGRAPH 2006)

Matrices. A Matrix (This one has 2 Rows and 3 Columns) To add two matrices: add the numbers in the matching positions:

6. Linear Discriminant Functions

HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression

Today. Today. Introduction. Matrices. Matrices. Computergrafik. Transformations & matrices Introduction Matrices

Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent

Machine Learning Basics. Sargur N. Srihari

Algorithms in everyday life. Algorithms. Algorithms and Java basics: pseudocode, variables, assignment, and interactive programs

06: Logistic Regression

Optimization. Industrial AI Lab.

ECS289: Scalable Machine Learning

0_PreCNotes17 18.notebook May 16, Chapter 12

Maths for Signals and Systems Linear Algebra in Engineering. Some problems by Gilbert Strang

Programming Exercise 3: Multi-class Classification and Neural Networks

Matrix Computations and " Neural Networks in Spark

Limited view X-ray CT for dimensional analysis

Linear Algebra Review

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

SOME CONCEPTS IN DISCRETE COSINE TRANSFORMS ~ Jennie G. Abraham Fall 2009, EE5355

Transformation. Jane Li Assistant Professor Mechanical Engineering & Robotics Engineering

Systems of Inequalities and Linear Programming 5.7 Properties of Matrices 5.8 Matrix Inverses

Algorithms and Java basics: pseudocode, variables, assignment, and interactive programs

Image Registration Lecture 4: First Examples

2D transformations: An introduction to the maths behind computer graphics

Optimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund

(Creating Arrays & Matrices) Applied Linear Algebra in Geoscience Using MATLAB

Physics 326G Winter Class 2. In this class you will learn how to define and work with arrays or vectors.

Distributed Machine Learning" on Spark

Algorithms in everyday life. Algorithms. Algorithms and Java basics: pseudocode, variables, assignment, and interactive programs

Perceptron: This is convolution!

Deep Neural Networks Optimization

Perceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

Introduction to MATLAB

Conditional Statements

Chapter 1: Number and Operations

CSC 1051 Data Structures and Algorithms I. Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University

Derek Bridge School of Computer Science and Information Technology University College Cork

4. Linear Algebra. In maple, it is first necessary to call in the linear algebra package. This is done by the following maple command

CS 395T Lecture 12: Feature Matching and Bundle Adjustment. Qixing Huang October 10 st 2018

Algorithms and Java basics: pseudocode, variables, assignment, and interactive programs

Affine Transformation. Edith Law & Mike Terry

Humanoid Robotics. Least Squares. Maren Bennewitz

An array is a collection of data that holds fixed number of values of same type. It is also known as a set. An array is a data type.

Scaled Machine Learning at Matroid

CSC 1051 Algorithms and Data Structures I. Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University

Independent systems consist of x

CS 179 Lecture 16. Logistic Regression & Parallel SGD

Selection Statements and operators

Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps

CSE 546 Machine Learning, Autumn 2013 Homework 2

Arrays, Part 3: Multidimensional arrays

Array Creation ENGR 1187 MATLAB 2

Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version)

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017

3D Geometry and Camera Calibration

CSC 1051 Villanova University. CSC 1051 Data Structures and Algorithms I. Course website:

Linear Regression Optimization

Arrays, Matrices and Determinants

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Machine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD

Matrix Inverse 2 ( 2) 1 = 2 1 2

STAT R Overview. R Intro. R Data Structures. Subsetting. Graphics. January 11, 2018

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

MATH 423 Linear Algebra II Lecture 17: Reduced row echelon form (continued). Determinant of a matrix.

Parallel Stochastic Gradient Descent

Graphics and Interaction Transformation geometry and homogeneous coordinates

COMP30019 Graphics and Interaction Transformation geometry and homogeneous coordinates

Figure (5) Kohonen Self-Organized Map

2 Second Derivatives. As we have seen, a function f (x, y) of two variables has four different partial derivatives: f xx. f yx. f x y.

MATLAB COURSE FALL 2004 SESSION 1 GETTING STARTED. Christian Daude 1

Transcription:

4: Regression (con.nued) CSC 4510 Machine Learning Dr. Mary Angela Papalaskari Department of CompuBng Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ The slides in this presentabon are adapted from: The Stanford online ML course hmp://www.ml class.org/ 1

Last Bme IntroducBon to linear regression IntuiBon least squares approximabon IntuiBon gradient descent algorithm Hands on: Simple example using excel 2

Today How to apply gradient descent to minimize the cost funcbon for regression linear algebra refresher 3

Reminder: sample problem Housing Prices (Portland, OR) 500 400 300 Price (in 1000s of dollars) 200 100 0 0 500 1000 1500 2000 2500 3000 Size (feet 2 ) 4

Reminder: NotaBon Training set of housing prices (Portland, OR) Size in feet 2 (x) Notation: m = Number of training examples x s = input variable / features y s = output variable / target variable Price ($) in 1000's (y) 2104 460 1416 232 1534 315 852 178 5

Reminder: Learning algorithm for hypothesis funcbon h Training Set Learning Algorithm Linear Hypothesis: Size of house Univariate linear regression) h Estimate price 6

Reminder: Learning algorithm for hypothesis funcbon h Training Set Learning Algorithm Linear Hypothesis: Size of house Univariate linear regression) h Estimate price 7

Gradient descent algorithm Linear Regression Model 8

Today How to apply gradient descent to minimize the cost funcbon for regression 1. a closer look at the cost funcbon 2. applying gradient descent to find the minimum of the cost funcbon linear algebra refresher 9

Hypothesis: Parameters: Cost Function: Goal: 10

Hypothesis: Simplified θ 0 = 0 Parameters: Cost Function: Goal: 11

(for fixed θ 1 this is a function of x) (function of the parameter θ 1 ) θ 0 = 0 3 3 y 2 1 2 1 0 0 1 2 3 x 0 0.5 0 0.5 1 1.5 2 2.5 h θ (x) = x 12

(for fixed θ 1 this is a function of x) (function of the parameter θ 1 ) θ 0 = 0 3 y 2 1 0 0 1 2 3 x h θ (x) = 0.5x 13

(for fixed θ 1 this is a function of x) (function of the parameter θ 1 ) θ 0 = 0 3 y 2 1 0 0 1 2 3 x h θ (x) = 0 14

What if θ 0 0? Hypothesis: Parameters: Cost Function: Goal: 15

(for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 500 400 Price ($) in 1000 s 300 200 100 0 0 1000 2000 3000 Size in feet 2 (x) h θ (x) = 10 + 0.1x 16

17

(for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 18

(for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 19

(for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 20

(for fixed θ 0, θ 1, this is a function of x) (function of the parameters θ 0, θ 1 ) 21

Today How to apply gradient descent to minimize the cost funcbon for regression 1. a closer look at the cost funcbon 2. applying gradient descent to find the minimum of the cost funcbon linear algebra refresher 22

Have some function Want Gradient descent algorithm outline: Start with some Keep changing to reduce until we hopefully end up at a minimum 23

Have some function Want Gradient descent algorithm 24

Have some function Want Gradient descent algorithm learning rate 25

If α is too small, gradient descent can be slow. If α is too large, gradient descent can overshoot the minimum. It may fail to converge, or even diverge. 26

at local minimum Current value of 27

Gradient descent can converge to a local minimum, even with the learning rate α fixed. 28

Gradient descent algorithm Linear Regression Model 29

Gradient descent algorithm update and simultaneously 30

J(θ 0,θ 1 ) θ 1 θ 0 31

J(θ 0,θ 1 ) θ 1 θ 0 32

33

(for fixed, this is a function of x) (function of the parameters ) 34

(for fixed, this is a function of x) (function of the parameters ) 35

(for fixed, this is a function of x) (function of the parameters ) 36

(for fixed, this is a function of x) (function of the parameters ) 37

(for fixed, this is a function of x) (function of the parameters ) 38

(for fixed, this is a function of x) (function of the parameters ) 39

(for fixed, this is a function of x) (function of the parameters ) 40

(for fixed, this is a function of x) (function of the parameters ) 41

(for fixed, this is a function of x) (function of the parameters ) 42

Batch Gradient Descent Batch : Each step of gradient descent uses all the training examples. Alternative: process part of the dataset for each step of the algorithm. The slides in this presentabon are adapted from: The Stanford online ML course hmp://www.ml class.org/ 43

What s next? We are not in univariate regression anymore: Size (feet 2 ) Number of bedrooms Number of floors Age of home Price ($1000) (years) 1 2104 5 1 45 460 1 1416 3 2 40 232 1 1534 3 2 30 315 1 852 2 1 36 178 44

What s next? We are not in univariate regression anymore: Size (feet 2 ) Number of bedrooms Number of floors Age of home Price ($1000) (years) 1 2104 5 1 45 460 1 1416 3 2 40 232 1 1534 3 2 30 315 1 852 2 1 36 178 45

What s next? We are not in univariate regression anymore: Size (feet 2 ) Number of bedrooms Number of floors Age of home Price ($1000) (years) 1 2104 5 1 45 460 1 1416 3 2 40 232 1 1534 3 2 30 315 1 852 2 1 36 178 46

Today How to apply gradient descent to minimize the cost funcbon for regression 1. a closer look at the cost funcbon 2. applying gradient descent to find the minimum of the cost funcbon linear algebra refresher 47

Matrix: Rectangular array of numbers Matrix Elements (entries of matrix), entry in the row, column. Dimension of matrix: number of rows x number of columns eg: 4 x 2 48

Vector: An n x 1 matrix. element 1-indexed vs 0-indexed: 49

Matrix Addition 50

Scalar Multiplication 51

Combination of Operands 52

Matrix-vector multiplication 53

Details: m x n matrix (m rows, n columns) n x 1 matrix (n-dimensional vector) m-dimensional vector To get, multiply s row with elements of vector, and add them up. 54

Example 55

House sizes: 56

Example 57

Details: m x n matrix (m rows, n columns) n x o matrix (n rows, o columns) m x o matrix The column of the matrix is obtained by multiplying with the column of. (for i = 1,2,,o) 58

Example 59

House sizes: Have 3 competing hypotheses: 1. 2. 3. Matrix Matrix 60

Let and be matrices. Then in general, (not commutative.) E.g. 61

Let Let Compute Compute 62

Identity Matrix Denoted (or ). Examples of identity matrices: 2 x 2 3 x 3 For any matrix, 4 x 4 63

Not all numbers have an inverse. Matrix inverse: If A is an m x m matrix, and if it has an inverse, Matrices that don t have an inverse are singular or degenerate 64

Matrix Transpose Example: Let Then be an m x n matrix, and let is an n x m matrix, and 65