REGRESSION ANALYSIS : LINEAR BY MAUAJAMA FIRDAUS & TULIKA SAHA

Similar documents
COMPUTATIONAL INTELLIGENCE (CS) (INTRODUCTION TO MACHINE LEARNING) SS16. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

Linear Regression and K-Nearest Neighbors 3/28/18

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

Linear Regression & Gradient Descent

Machine Learning Lecture-1

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

CSC 4510 Machine Learning

Multilayer Feed-forward networks

CP365 Artificial Intelligence

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Linear Regression Optimization

Knowledge Discovery and Data Mining

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

PARALLELIZED IMPLEMENTATION OF LOGISTIC REGRESSION USING MPI

Cost Functions in Machine Learning

06: Logistic Regression

Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent

Today. Gradient descent for minimization of functions of real variables. Multi-dimensional scaling. Self-organizing maps

Data Mining: Models and Methods

Machine Learning / Jan 27, 2010

Knowledge Discovery and Data Mining. Neural Nets. A simple NN as a Mathematical Formula. Notes. Lecture 13 - Neural Nets. Tom Kelsey.

Linear Regression Implementation

Modern Methods of Data Analysis - WS 07/08

Network Traffic Measurements and Analysis

Lecture on Modeling Tools for Clustering & Regression

3 Types of Gradient Descent Algorithms for Small & Large Data Sets

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Optimization. Industrial AI Lab.

Logistic Regression. May 28, Decision boundary is a property of the hypothesis and not the data set e z. g(z) = g(z) 0.

Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version)

CSE 40171: Artificial Intelligence. Learning from Data: Unsupervised Learning

Gradient Descent - Problem of Hiking Down a Mountain

A Neuro Probabilistic Language Model Bengio et. al. 2003

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Lecture 1 Notes. Outline. Machine Learning. What is it? Instructors: Parth Shah, Riju Pahwa

Ensemble methods in machine learning. Example. Neural networks. Neural networks

CSC 411: Lecture 02: Linear Regression

Machine Learning Classifiers and Boosting

Lecture #11: The Perceptron

CS 343: Artificial Intelligence

Introduction to Machine Learning. Xiaojin Zhu

Using Machine Learning to Optimize Storage Systems

Unsupervised Learning

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Learning from Data Linear Parameter Models

CS489/698: Intro to ML

Large Scale Data Analysis Using Deep Learning

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

COMPUTATIONAL INTELLIGENCE

Using Existing Numerical Libraries on Spark

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

5 Machine Learning Abstractions and Numerical Optimization

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Kernels and Clustering

Perceptron: This is convolution!

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

SYSTEMS OF NONLINEAR EQUATIONS

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

1 Case study of SVM (Rob)

ECS289: Scalable Machine Learning

Optimization and least squares. Prof. Noah Snavely CS1114

Interactive Graphics. Lecture 9: Introduction to Spline Curves. Interactive Graphics Lecture 9: Slide 1

Tested Paradigm to Include Optimization in Machine Learning Algorithms

Integrated Math I. IM1.1.3 Understand and use the distributive, associative, and commutative properties.

Pattern Classification Algorithms for Face Recognition

CS 229 Midterm Review

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017

Forecasting Video Analytics Sami Abu-El-Haija, Ooyala Inc

3 Nonlinear Regression

2.4. A LIBRARY OF PARENT FUNCTIONS

CS281 Section 3: Practical Optimization

Introduction to optimization methods and line search

Edge and local feature detection - 2. Importance of edge detection in computer vision

6. Linear Discriminant Functions

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Machine Learning using Matlab. Lecture 3 Logistic regression and regularization

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

More on Learning. Neural Nets Support Vectors Machines Unsupervised Learning (Clustering) K-Means Expectation-Maximization

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

CS 8520: Artificial Intelligence

Introduction to Design Optimization: Search Methods

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Unsupervised Learning: Clustering

Programming, numerics and optimization

Clustering: Classic Methods and Modern Views

Efficient Iterative Semi-supervised Classification on Manifold

Perceptrons and Backpropagation. Fabio Zachert Cognitive Modelling WiSe 2014/15

Lecture 9: Introduction to Spline Curves

2. Linear Regression and Gradient Descent

Machine Learning: Think Big and Parallel

Introduction to Design Optimization: Search Methods

CMU-Q Lecture 9: Optimization II: Constrained,Unconstrained Optimization Convex optimization. Teacher: Gianni A. Di Caro

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Transcription:

REGRESSION ANALYSIS : LINEAR BY MAUAJAMA FIRDAUS & TULIKA SAHA

MACHINE LEARNING It is the science of getting computer to learn without being explicitly programmed. Machine learning is an area of artificial intelligence concerned with the development of techniques which allow computers to "learn". More specifically, machine learning is a method for creating computer programs by the analysis of data sets.

APPLICATIONS Search Engines like Google, Bing etc. Facebook photo tagging application. Self Customizing Programs and many more.

SUPERVISED LEARNING Given the right answer for each of our examples in the training set. The task of the algorithm is to find many more such right answers for new examples. Consists of two problems- Regression Problem Classification Problem

REGRESSION PROBLEM The term regression refers to the fact that we are trying to predict a continuous-valued output.

CLASSIFICATION PROBLEM It refers to the fact that we are trying to predict discrete valued outputs.

UNSUPERVISED LEARENING Given a set of data or examples in the training set without prior information of what the data is all about. The task of the algorithm is to find the structure among the data. Clustering Problem is an example of Unsupervised Learning.

LINEAR REGRESSION Why are we looking at this algorithm? Machine Learning a field of predictive modelling Linear Regression Developed in the field of statistic A model that attempts to show the relationship between two variables with a linear equation Borrowed by Machine Learning Finds it s application in Evaluating Trends and Sales Estimates Analyzing the Impact of Price Change and many more

LINEAR REGRESSION Basic Framework Dependent variable y, also called the output/explained variable which is to be predicted Independent variables, also called the input/explanatory variables that is to be used for making predictions. Regression is the general task of attempting to predict the value of output variable y from the input variables.

MODEL REPRESENTATION

MODEL REPRESENTATION

MODEL REPRESENTATION How do we represent h?

MODEL REPRESENTATION Given a training set, Hypothesis : where How to choose

MODEL REPRESENTATION

COST FUNCTION Idea : Choose, so that is as close to y for our training examples (x, y).

COST FUNCTION So going with the idea, If hypothesis : We want to minimise the squared error function i.e. i.e the difference between the predicted value ( ) and the actual label (y) should be as minimum as possible.

COST FUNCTION Therefore, the cost function becomes where summation( ) represents sum over my training set from i to m. and where ( ) represents that minimising one half of something shall give us same values of and as minimising the entire thing.

COST FUNCTION So, overall we want to find the value of that minimises the entire expression. &

GRADIENT DESCENT ALGORITHM Gradient Descent algorithm is basically used to minimise some function J(say a cost function). Problem Setup Have some function J(, ) Want to Outline Start with some & (say = 0, = 0) Keep changing, to reduce J(, ) until we hopefully end up at a minimum.

GRADIENT DESCENT ALGORITHM where := is the assignment operator α is the learning rate that controls how big a step we can take downhill while creating descent Simultaneous update

GRADIENT DESCENT INTUITION For eg :) say we have only one parameter, so, &, hence the GD equation becomes is the derivative term which basically denotes the slope of the line that is just tangent to the function at the point

GRADIENT DESCENT INTUITION

GRADIENT DESCENT INTUITION

GRADIENT DESCENT INTUITION If is too small, then gradient descent will converge slowly because very small and hence lot of steps are being taken before it gets anywhere close to the global minimum.

GRADIENT DESCENT INTUITION If is too large, then gradient descent will overshoot the minimum because the steps taken are huge. It may fail to converge.

GRADIENT DESCENT FOR LINEAR REGRESSION So, applying GD algorithm to minimise the squared error cost function i.e. where, Calculating the derivative part, &

GRADIENT DESCENT FOR LINEAR So for, or j = 0 : REGRESSION for, or j = 1 :

GRADIENT DESCENT FOR LINEAR REGRESSION Now, by applying GD algorithm to the previous equation(derivative part),

FEATURE SCALING If one feature has a range say 0-2000 & another say 0-5 i.e. a 2000 to 5 ratio, then the contours of the cost funtion takes up a very skewed elliptical shape & the GD may end up taking a long time & can oscillate back & forth before it can finally find its way to the global minimum. A useful thing to do is scale the features i.e. the different features take on similar range of values. So, by feature scaling we get the features in a range of -1 to +1.

LINEAR REGRESSION WITH MULTIPLE VARIABLES for multiple features, For convenience of notation, define = 1, So,

POLYNOMIAL REGRESSION It allows to use the machinery of linear regression to fit very complicated even non-linear function to the data. Quadratic model : Cubic model :

POLYNOMIAL REGRESSION The form of hypothesis is : where,, Feature Scaling is required here

NORMAL EQUATION GD is basically an algorithm for linear regression that takes multiple iterations to reach the global minimum or to minimise a cost function. Normal Equation is a method to solve for analytically i.e. in one step we get the optimum value. Intuition : If 1D, To minimise the above quadratic function, we calculate the derivative of the function and set it to 0 i.e. & solve for.

NORMAL EQUATION

GRADIENT DESCENT Vs NORMAL EQUATION GRADIENT DESCENT Need to choose. Needs many iterations Works well even if n is large where n is the no. of features. NORMAL EQUATION No need to choose. Needs no iteration Slow if n is large because it needs to compute which is n*n matrix.