Multi-Class Logistic Regression and Perceptron

Similar documents
The Perceptron. Simon Šuster, University of Groningen. Course Learning from data November 18, 2013

Perceptron Introduction to Machine Learning. Matt Gormley Lecture 5 Jan. 31, 2018

CSEP 573: Artificial Intelligence

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Lecture 20: Neural Networks for NLP. Zubin Pahuja

VECTOR SPACE CLASSIFICATION

AKA: Logistic Regression Implementation

Classification: Linear Discriminant Functions

Mini-project 2 CMPSCI 689 Spring 2015 Due: Tuesday, April 07, in class

Gradient Descent. Wed Sept 20th, James McInenrey Adapted from slides by Francisco J. R. Ruiz

Week 3: Perceptron and Multi-layer Perceptron

Linear Classification and Perceptron

Logistic Regression and Gradient Ascent

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

ADVANCED CLASSIFICATION TECHNIQUES

Machine Learning Classifiers and Boosting

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Multinomial Regression and the Softmax Activation Function. Gary Cottrell!

Ensemble methods in machine learning. Example. Neural networks. Neural networks

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016

Classifiers and Detection. D.A. Forsyth

Data Mining. Neural Networks

CS229 Final Project: Predicting Expected Response Times

Lecture 9: Support Vector Machines

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

k-nearest Neighbors + Model Selection

DM6 Support Vector Machines

Instructor: Jessica Wu Harvey Mudd College

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

CSE 158. Web Mining and Recommender Systems. Midterm recap

Perceptron: This is convolution!

Machine Learning / Jan 27, 2010

Deep neural networks II

CS6220: DATA MINING TECHNIQUES

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /10/2017

06: Logistic Regression

Search Engines. Information Retrieval in Practice

Unsupervised Learning: Clustering

LARGE MARGIN CLASSIFIERS

For Monday. Read chapter 18, sections Homework:

Predict the box office of US movies

Machine Learning using Matlab. Lecture 3 Logistic regression and regularization

6.034 Quiz 2, Spring 2005

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

Dependency Parsing CMSC 723 / LING 723 / INST 725. Marine Carpuat. Fig credits: Joakim Nivre, Dan Jurafsky & James Martin

CS 224d: Assignment #1

CS5670: Computer Vision

Logistic Regression: Probabilistic Interpretation

Statistical NLP Spring 2009

Generative and discriminative classification techniques

CS570: Introduction to Data Mining

CMPT 882 Week 3 Summary

Machine Learning: Think Big and Parallel

Deep Learning. Practical introduction with Keras JORDI TORRES 27/05/2018. Chapter 3 JORDI TORRES

Lecture #11: The Perceptron

Content-based image and video analysis. Machine learning

CSE 573: Artificial Intelligence Autumn 2010

Simple Model Selection Cross Validation Regularization Neural Networks

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Evaluating Classifiers

SEMANTIC COMPUTING. Lecture 8: Introduction to Deep Learning. TU Dresden, 7 December Dagmar Gromann International Center For Computational Logic

CS 224N: Assignment #1

Natural Language Processing CS 6320 Lecture 6 Neural Language Models. Instructor: Sanda Harabagiu

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Supervised Learning: The Setup. Spring 2018

How Learning Differs from Optimization. Sargur N. Srihari

CPSC 340: Machine Learning and Data Mining. Multi-Class Classification Fall 2017

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

Practice Questions for Midterm

Large Margin Classification Using the Perceptron Algorithm

Function Algorithms: Linear Regression, Logistic Regression

Case Study 1: Estimating Click Probabilities

CPSC 340: Machine Learning and Data Mining. Principal Component Analysis Fall 2016

Detecting Malicious URLs. Justin Ma, Lawrence Saul, Stefan Savage, Geoff Voelker. Presented by Gaspar Modelo-Howard September 29, 2010.

Network Traffic Measurements and Analysis

LOGISTIC REGRESSION FOR MULTIPLE CLASSES

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

Statistical NLP Spring 2009

6. Linear Discriminant Functions

Topic 4: Support Vector Machines

Neural Networks (pp )

Deep Learning for Computer Vision

Natural Language Processing

Supervised Learning for Image Segmentation

CS 179 Lecture 16. Logistic Regression & Parallel SGD

Using Machine Learning to Optimize Storage Systems

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

Lecture 10: SVM Lecture Overview Support Vector Machines The binary classification problem

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

All lecture slides will be available at CSC2515_Winter15.html

Logistic Regression. May 28, Decision boundary is a property of the hypothesis and not the data set e z. g(z) = g(z) 0.

Multiclass Classification

Assignment # 5. Farrukh Jabeen Due Date: November 2, Neural Networks: Backpropation

CS145: INTRODUCTION TO DATA MINING

The exam is closed book, closed notes except your one-page (two-sided) cheat sheet.

Introduction to Machine Learning Prof. Mr. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Machine Learning for NLP

Transcription:

Multi-Class Logistic Regression and Perceptron Instructor: Wei Xu Some slides adapted from Dan Jurfasky, Brendan O Connor and Marine Carpuat

MultiClass Classification Q: what if we have more than 2 categories? Sentiment: Positive, Negative, Neutral Document topics: Sports, Politics, Business, Entertainment, Q: How to easily do Multi-label classification?

Two Types of MultiClass Classification Multi-label Classification - each instance can be assigned more than one labels Multinominal Classification - each instance appears in exactly one class (classes are exclusive)

Multinominal Classification Pretty straightforward with Naive Bayes.

Log-Linear Models

Multinominal Logistic Regression normalization term (Z) so that probabilities sum to 1

(a.k.a) Softmax Regression

Q: what if there are only 2 categories?

Q: what if there are only 2 categories?

Q: what if there are only 2 categories?

Q: what if there are only 2 categories?

Q: what if there are only 2 categories?

Q: what if there are only 2 categories? Sigmoid (logistic) function

Multinominal Logistic Regression Binary (two classes): We have one feature vector that matches the size of the vocabulary Multi-class in practice: one weight vector for each category In practice, can represent this with one giant weight vector and repeated features for each category.

Maximum Likelihood Estimation

Multiclass LR Gradient empirical feature count expected feature count

(a.k.a) Maximum Entropy Classifier or MaxEnt Math proof of LR=MaxEnt : - [Klein and Manning 2003] - [Mount 2011] http://www.win-vector.com/dfiles/logisticregressionmaxent.pdf

Perceptron Algorithm Very similar to logistic regression Not exactly computing gradient [Rosenblatt 1957] http://www.peterasaro.org/writing/neural_networks.html

Perceptron Algorithm Very similar to logistic regression Not exactly computing gradient (simpler) vs. weighted sum of features

Perceptron vs. LR The Perceptron is an online learning algorithm. Standard Logistic Regression is not

Online Learning The Perceptron is an online learning algorithm. Logistic Regression is not: this update is effectively the same as w += y_i * x_i

(Full) Batch Learning update parameters after each pass of training set Initialize weight vector w = 0 Create features Loop for K iterations Loop for all training examples x_i, y_i update_weights(w, x_i, y_i) update_weights(w)

Online Learning update parameters for each training example Initialize weight vector w = 0 Create features Loop for K iterations Loop for all training examples x_i, y_i update_weights(w, x_i, y_i)

Perceptron Algorithm Very similar to logistic regression Not exactly computing gradient features of a training example x weights label If y = 1, increase the weights for features in If y = -1, decrease the weights for features in

Perceptron Algorithm Very similar to logistic regression Not exactly computing gradient Initialize weight vector w = 0 Loop for K iterations Loop For all training examples x_i if sign(w * x_i)!= y_i w += y_i * x_i Error-driven!

The Intuition For a given example, makes a prediction, then checks to see if this prediction is correct. If the prediction is correct, do nothing. If the prediction is incorrect, change its parameters so that it would do better on this example next time around.

Perceptron (vs. LR) Only hyperparameter is maximum number of iterations (LR also needs learning rate)

Perceptron (vs. LR) Only hyperparameter is maximum number of iterations (LR also needs learning rate) Guaranteed to converge if the data is linearly separable (LR always converge)

Linear Separability

What does converge mean? It means that it can make an entire pass through the training data without making any more updates. In other words, it has correctly classified every training example. Geometrically, this means that it was found some hyperplane that correctly segregates the data into positive and negative examples

What if non linearly separable? In real-world problem, this is nearly always the case. The perceptron will not be able to converge. Q: Then, when to stop?