CS 237: Probability in Computing

Similar documents
CS 237: Probability in Computing

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

Logistic Regression. May 28, Decision boundary is a property of the hypothesis and not the data set e z. g(z) = g(z) 0.

Week 3: Perceptron and Multi-layer Perceptron

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016

CS 179 Lecture 16. Logistic Regression & Parallel SGD

CSE 546 Machine Learning, Autumn 2013 Homework 2

Multinomial Regression and the Softmax Activation Function. Gary Cottrell!

Network Traffic Measurements and Analysis

Stat 342 Exam 3 Fall 2014

Regularization and model selection

Programming Exercise 4: Neural Networks Learning

06: Logistic Regression

Weka ( )

Lesson 20: Every Line is a Graph of a Linear Equation

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Homework 2. Due: March 2, 2018 at 7:00PM. p = 1 m. (x i ). i=1

Practice Exam Sample Solutions

Lecture 1 Notes. Outline. Machine Learning. What is it? Instructors: Parth Shah, Riju Pahwa

node2vec: Scalable Feature Learning for Networks

HMC CS 158, Fall 2017 Problem Set 3 Programming: Regularized Polynomial Regression

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Lecture 20: Neural Networks for NLP. Zubin Pahuja

More on Neural Networks. Read Chapter 5 in the text by Bishop, except omit Sections 5.3.3, 5.3.4, 5.4, 5.5.4, 5.5.5, 5.5.6, 5.5.7, and 5.

Ensemble methods in machine learning. Example. Neural networks. Neural networks

APPLICATION OF SOFTMAX REGRESSION AND ITS VALIDATION FOR SPECTRAL-BASED LAND COVER MAPPING

1. What specialist uses information obtained from bones to help police solve crimes?

A Brief Look at Optimization

I211: Information infrastructure II

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Understanding Andrew Ng s Machine Learning Course Notes and codes (Matlab version)

Logistic Regression and Gradient Ascent

Multi-Class Logistic Regression and Perceptron

Lecture #11: The Perceptron

1.5 Equations of Lines and Planes in 3-D

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

CS 224N: Assignment #1

model order p weights The solution to this optimization problem is obtained by solving the linear system

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Machine Learning Lecture-1

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

3.5 Day 1 Warm Up. Graph each line. 3.4 Proofs with Perpendicular Lines

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

CS 224N: Assignment #1

Predict the Likelihood of Responding to Direct Mail Campaign in Consumer Lending Industry

Neural Networks (pp )

3.1. Exponential Functions. Copyright 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

Assignment 4 (Sol.) Introduction to Data Analytics Prof. Nandan Sudarsanam & Prof. B. Ravindran

(Multinomial) Logistic Regression + Feature Engineering

Package hiernet. March 18, 2018

2 Second Derivatives. As we have seen, a function f (x, y) of two variables has four different partial derivatives: f xx. f yx. f x y.

Introduction to Machine Learning. Xiaojin Zhu

MDM 4UI: Unit 8 Day 2: Regression and Correlation

Neural Network Neurons

Package automl. September 13, 2018

CS 112 Introduction to Computing II. Wayne Snyder Computer Science Department Boston University

Writing and Graphing Linear Equations. Linear equations can be used to represent relationships.

CS145: INTRODUCTION TO DATA MINING

Function Algorithms: Linear Regression, Logistic Regression

AKA: Logistic Regression Implementation

Nearest neighbor classification DSE 220

CS535 Big Data Fall 2017 Colorado State University 10/10/2017 Sangmi Lee Pallickara Week 8- A.

Predicting Popular Xbox games based on Search Queries of Users

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

CS5670: Computer Vision

Univariate and Multivariate Decision Trees

Graphing Linear Equations

Supervised Learning Classification Algorithms Comparison

Data Mining Issues. Danushka Bollegala

Lecture 25: Review I

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2017

Scatterplot: The Bridge from Correlation to Regression

Frameworks in Python for Numeric Computation / ML

Logistic Regression: Probabilistic Interpretation

Semi-Supervised Learning of Named Entity Substructure

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Computational Intelligence (CS) SS15 Homework 1 Linear Regression and Logistic Regression

Predict the box office of US movies

14.2 The Regression Equation

Backpropagation + Deep Learning

15-780: Problem Set #2

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

CSE 158 Lecture 2. Web Mining and Recommender Systems. Supervised learning Regression

Increasing and Decreasing Functions. MATH 1003 Calculus and Linear Algebra (Lecture 20) Increasing and Decreasing Functions

Partitioning Data. IRDS: Evaluation, Debugging, and Diagnostics. Cross-Validation. Cross-Validation for parameter tuning

Machine Learning / Jan 27, 2010

Introduction to Automated Text Analysis. bit.ly/poir599

Linear Regression Implementation

Problem Set #6 Due: 11:30am on Wednesday, June 7th Note: We will not be accepting late submissions.

Linear Classification and Perceptron

Universität Freiburg Lehrstuhl für Maschinelles Lernen und natürlichsprachliche Systeme. Machine Learning (SS2012)

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Introduction to Artificial Intelligence

Exploring Slope. We use the letter m to represent slope. It is the ratio of the rise to the run.

Deep neural networks II

Solution 1 (python) Performance: Enron Samples Rate Recall Precision Total Contribution

Planar data classification with one hidden layer

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

Transcription:

CS 237: Probability in Computing Wayne Snyder Computer Science Department Boston University Lecture 26: Logistic Regression 2 Gradient Descent for Linear Regression Gradient Descent for Logistic Regression Example: Logistic Regression for Heights Gender Putting it all together: Multivariate Logistic Regression

Recall: we were searching for the parameters using the technique of gradient descent: for this linear regression problem

Basic idea: Define a cost or loss function J(...) which gives the cost or penalty for a given choice of parameters, then search for the parameters which minimize this cost. So let s pretend we didn t have the formulae at the upper right, suppose we needed to find them by gradient descent. In linear regression this would mean finding values for which minimize the MSE, in other words minimize the cost function: Cost Function = MSE

Partial Derivatives: To find a partial derivative of a function with multiple parameters, we find the rate at which each parameter varies its derivative -- in isolation, considering each of the other parameters to effectively be a constant: We will use b m instead of will represent points as column vectors:

To find the minimum value along one axis we will work with only one of the partial derivatives as a time, say the y-intercept: Step One: Choose an initial point b 0. Step Two: Choose a step size or learning rate! threshold of accuracy ". Step Three: Move that distance along the axis, in the decreasing direction (the negative of the slope), repeat until the distance moved is less than ". Step Four: Output b n+1 as the minimum. Let s look at a notebook showing this...

Gradient Descent for Linear Regression: To find a point in multiple dimensions, we simply do all dimensions at the same time. Here is the algorithm from the reading: def update(m,b,x,y,lam): N = len(x) m_deriv = 0 b_deriv = 0 for i in range(n): m_deriv += -2*X[i]*(Y[i]-(m*X[i]+b)) b_deriv += -2*(Y[i]-(m*X[i]+b)) m_deriv /= float(n) b_deriv /= float(n) m -= m_deriv * lam b -= b_deriv * lam return (m, b) def gradient_descent(m,b,x,y,lam,epsilon): while True: (m1, b1) = update(m,b,x,y,lam) if abs(m1 m) + abs(b1 b) < epsilon: return (m1,b1) m = m1 b = b1

As the parameters are tuned to minimize the cost (= measuring how well the parameters fit the model) you get a better better fit between the model the data. You can run the gradient descent model as long as you wish to get a better fit. Obviously, defining the cost function picking the learning rate threshold are critical decisions, much research has been devoted to different cost models different approaches to gradient descent.

There are dozens of different cost functions that have been defined even more varients of algorithms that perform minimization of a given cost function available in stard Python libraries:

Logistic Regression: The Cost Function Ok, back to logistic regression! A common cost function used in logistic regression is Cross-Entropy or Log-Loss: If Y is the actual value (0 or 1, from the actual data) is the predicted probability P(Y=1) output by logistic regression, then the cost of this prediction is:

Logistic Regression: The Cost Function Thus, in our simple example (predicting gender from height) we want to predict the y- intercept b slope m such that the regression line, after being transformed by the sigmoid function s(...), minimizes the following cost: Predicted probability that P(y i = 1)

Logistic Regression: Putting it All Together In the case of m multiple independent variables X binary dependent variable Y: where the prediction for the sample (row) using the parameters is denoted, then we use the same gradient descent algorithm to find the values of the parameters such that the following cost function is minimized:

Logistic Regression: Supervised Learning Even if we have successfully minimized the cost function, surely the cost will not be 0, which means that our model will not be perfect. How do we evaluate our model? We can think of this algorithm as trying to learn the categories (0 or 1) that the independent variables belong to, use our data itself to test the results. The basic idea is to take a rom selection (say 80%) of our data set, find the best fitting parameters (called training ) then test it on the remaining 20%. The percentage of the remaining data that is successfully classified is the accuracy of our model: Train Test