DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Similar documents
DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Linear Regression & Gradient Descent

Homework 5. Due: April 20, 2018 at 7:00PM

CS 237: Probability in Computing

Machine Learning Lecture-1

Machine Learning Basics. Sargur N. Srihari

Computational Intelligence (CS) SS15 Homework 1 Linear Regression and Logistic Regression

Machine Learning and Computational Statistics, Spring 2016 Homework 1: Ridge Regression and SGD

HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

CSC 4510 Machine Learning

Linear Regression QuadraJc Regression Year

A Brief Look at Optimization

Machine Learning and Computational Statistics, Spring 2015 Homework 1: Ridge Regression and SGD

Homework 2. Due: March 2, 2018 at 7:00PM. p = 1 m. (x i ). i=1

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

ECS289: Scalable Machine Learning

Optimization Plugin for RapidMiner. Venkatesh Umaashankar Sangkyun Lee. Technical Report 04/2012. technische universität dortmund

Machine Learning / Jan 27, 2010

ITI Introduction to Computing II Winter 2018

scikit-learn (Machine Learning in Python)

Programming Exercise 1: Linear Regression

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016

IMAGE PROCESSING >FILTERS AND EDGE DETECTION FOR COLOR IMAGES UTRECHT UNIVERSITY RONALD POPPE

COMPUTATIONAL INTELLIGENCE (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 2: Linear Regression Gradient Descent Non-linear basis functions

3 Types of Gradient Descent Algorithms for Small & Large Data Sets

Image warping , , Computational Photography Fall 2017, Lecture 10

Logistic Regression

Convex and Distributed Optimization. Thomas Ropars

Network Traffic Measurements and Analysis

Computational Machine Learning, Fall 2015 Homework 4: stochastic gradient algorithms

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

06: Logistic Regression

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

General Instructions. Questions

CP365 Artificial Intelligence

Case Study 1: Estimating Click Probabilities

Deep Neural Networks Optimization

Programming Exercise 3: Multi-class Classification and Neural Networks

ActiveClean: Interactive Data Cleaning For Statistical Modeling. Safkat Islam Carolyn Zhang CS 590

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Optimization. Industrial AI Lab.

Lecture 7: Linear Regression (continued)

1 Document Classification [60 points]

Contents. I Basics 1. Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

Tutorial on Machine Learning Tools

Neural Networks. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Penalizied Logistic Regression for Classification

Homework 1 (a and b) Convex Sets and Convex Functions

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

CS 179 Lecture 16. Logistic Regression & Parallel SGD

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

291 Programming Assignment #3

Probabilistic Graphical Models

Perceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

Problem 1: Complexity of Update Rules for Logistic Regression

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Fraud Detection using Machine Learning

CSC 4510 Machine Learning

15-110: Principles of Computing, Spring 2018

Optimization and least squares. Prof. Noah Snavely CS1114

Clustering algorithms and autoencoders for anomaly detection

CSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

CSE 546 Machine Learning, Autumn 2013 Homework 2

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

Product Catalog. AcaStat. Software

How Learning Differs from Optimization. Sargur N. Srihari

CPSC 340: Machine Learning and Data Mining. Robust Regression Fall 2015

15-110: Principles of Computing, Spring 2018

Ensemble Methods: Bagging

18.02 Multivariable Calculus Fall 2007

STATISTICS (STAT) Statistics (STAT) 1

Data Management - 50%

Gradient Descent Optimization Algorithms for Deep Learning Batch gradient descent Stochastic gradient descent Mini-batch gradient descent

autorob.github.io Inverse Kinematics UM EECS 398/598 - autorob.github.io

CS 224d: Assignment #1

CS 237: Probability in Computing

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Performance of Latent Growth Curve Models with Binary Variables

CPSC 340: Machine Learning and Data Mining. Kernel Trick Fall 2017

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Comparison of Optimization Methods for L1-regularized Logistic Regression

Class 6 Large-Scale Image Classification

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Chapter 7: Numerical Prediction

ML4Bio Lecture #1: Introduc3on. February 24 th, 2016 Quaid Morris

2. Linear Regression and Gradient Descent

Missing Data and Imputation

Backpropagation + Deep Learning

CS489/698: Intro to ML

Using the Deformable Part Model with Autoencoded Feature Descriptors for Object Detection

Nearest Neighbor Methods

Transcription:

DS 4400 Machine Learning and Data Mining I Alina Oprea Associate Professor, CCIS Northeastern University September 18 2018

Logistics HW 1 is on Piazza and Gradescope Deadline: Friday, Sept. 28, 2018 Office hours Alina: Thu 4:30-6:00pm (ISEC 625) Anand: Tue 2-3pm (ISEC 605) How to submit HW Create a PDF and submit on Gradescope before 11:59pm the day assignment is due Include link to code and ReadMe file Use Jupyter notebook in R or Python 2

Collaboration policy What is allowed You can discuss the homework with your colleagues You can post questions on Piazza and come to office hours You can search for online resources to better understand class concepts What is not allowed Sharing your written answers with colleagues Sharing your code or receiving code from colleague Do not use code from the Internet! 3

Linear regression Features Response variables TV ads 4

ҧ Simple linear regression Dataset x (i) R, y (i) R, h θ x = θ 0 + θ 1 x J θ = 1 σ n n i=1 J θ θ 0 J(θ) θ 1 = 2 σ n n i=1 θ 0 + θ 1 x (i) y (i) 2 loss θ 0 + θ 1 x (i) y (i) = 0 = 2 σ n i=1 n x (i) θ 0 + θ 1 x (i) y (i) = 0 Solution of min loss θ 0 = തy θ 1 xҧ θ 1 = σ (x i x)(y ҧ (i) തy) σ x (i) xҧ 2 n x = σ i=1 n n തy = σ i=1 n x (i) y (i) 5

Interpretation Residual y = θ 0 + θ 1 x θ 0 Intercept x (i), y (i) θ 1 = Δy/Δx Slope MSE= 1 σ n n i=1 h θ x = θ 0 + θ 1 x h θ x (i) y (i) 2 6

Regression Learning Training Data Preprocessing Feature extraction Regression model Labeled x (i), y (i) Testing Normalization Standardization Feature Selection Train MSE h θ x = θ 0 + θ 1 x New data Unlabeled x Regression model Predictions h θ x = θ 0 + θ 1 x Price Risk score Test MSE 7

Outline Multiple linear regression Derivation in matrix form Practical issues Feature scaling and normalization Outliers Categorical variables Gradient descent Efficient algorithm for optimizing loss function Training LR with Gradient Descent 8

Multiple Linear Regression Dataset: x (i) R d, y (i) R 9

Use Vectorization 10

Use Vectorization = θ T x (i) 11

Loss function 1 n 1 n = 1 n Xθ y 2 Euclidian Norm 12

Matrix and vector gradients If y = f x, y R scalar, x R n vector y x = y x 1 y y x 2 x n If y = f x, y R m, x R n Jacobian matrix 13

Properties If w, x are d 1 vectors, wt x = w x If A: n d x: (d 1), Ax = A x If A: d d x: (d 1), xt Ax x If A symmetric: xt Ax = 2Ax x If x: (d 1), x 2 x = 2x T = A + A T x 14

Min loss function 1 J θ = 1 n n Xθ y 2 Using chain rule f θ = h g θ, f(θ) θ = h(g(θ)) g(θ) θ θ h x = x 2, g θ = Xθ y h x = 2x T, g θ = X J(θ) = 2 [ X θ y T X] = 0 X T Xθ y = 0 θ n 15

Closed-form solution AGA = A 16

Multiple Linear Regression Dataset: x (i) R d, y i R Hypothesis h θ x = θ T x MSE = 1 σ n n i=1 θ T x (i) y (i) 2 loss / cost 17

Regression Learning Training Data Preprocessing Feature extraction Regression model Labeled (Typically) Normalization Standardization Feature Selection Train MSE Testing New data Regression model Predictions Unlabeled Price Risk score Test MSE 18

Feature Standardization 19

Other feature normalization Re-scaling x j (i) x j (i) minj max j min j [0,1] min j and max j : min and max value of feature j Mean normalization x j (i) x j (i) μj max j min j Mean 0 20

Outliers Dashed model is without outlier point Linear regression is not resilient to outliers! Outliers can be eliminated based on residual value Other techniques for outlier detection 21

Categorical variables Predict credit card balance Age Income Number of cards Credit limit Credit rating Categorical variables Student (Yes/No) State (50 different levels) 22

Indicator Variables Binary (two-level) variable Add new feature x j = 1 if student and 0 otherwise Multi-level variable State: 50 values x MA = 1 if State = MA and 0, otherwise x NY = 1 if State = NY and 0, otherwise How many indicator variables are needed? Disadvantages: data becomes too sparse for large number of levels 23

Comparison with ANOVA ANOVA General statistical method for comparing populations Example 1: Is the income of MA and NY residents similar? Example 2: Is there any difference between patients with certain treatment or no treatment? Linear regression Learning algorithm used for predicting responses on new data Example 1: Predict the income of US residents Example 2: Predict survival of patients Hypothesis testing for coefficient equal to zero is similar to ANOVA 24

Outline Multiple linear regression Derivation in matrix form Practical issues Feature scaling and normalization Outliers Categorical variables Gradient descent Efficient algorithm for optimizing loss function Training LR with Gradient Descent 25

What Strategy to Use? 26

Follow the Slope Follow the direction of steepest descent! 27

How to optimize J(θ)? 28

How to optimize J(θ)? Different starting point 29

Gradient Descent Gradient = slope of line tangent to curve at the same point 30

Gradient Descent What happens when θ reaches a local minimum? The slope is 0, and gradient descent converges! 31

Review Solution for multiple linear regression can be computed in closed form Matrix inversion is computationally intense Gradient descent is an efficient algorithm for optimization and training LR The most widely used algorithm in ML! Many variants (SGD, Coordinate descent, etc.) Converges if objective is convex In practice several techniques can help generate more robust models Outlier removal Feature scaling 32

Acknowledgements Slides made using resources from: Andrew Ng Eric Eaton David Sontag Thanks!