Classification: Linear Discriminant Functions

Similar documents
Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

6. Linear Discriminant Functions

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Machine Learning Classifiers and Boosting

Combine the PA Algorithm with a Proximal Classifier

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

FMA901F: Machine Learning Lecture 3: Linear Models for Regression. Cristian Sminchisescu

Support vector machines

1 Case study of SVM (Rob)

Lecture #11: The Perceptron

Introduction to Machine Learning

CPSC 340: Machine Learning and Data Mining. More Linear Classifiers Fall 2017

Network Traffic Measurements and Analysis

Machine Learning / Jan 27, 2010

Equation to LaTeX. Abhinav Rastogi, Sevy Harris. I. Introduction. Segmentation.

CS 229 Midterm Review

Content-based image and video analysis. Machine learning

What is machine learning?

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Supervised Learning with Neural Networks. We now look at how an agent might learn to solve a general problem by seeing examples.

All lecture slides will be available at CSC2515_Winter15.html

ECG782: Multidimensional Digital Signal Processing

A Systematic Overview of Data Mining Algorithms

Linear Discriminant Functions: Gradient Descent and Perceptron Convergence

Supervised vs unsupervised clustering

CS 559: Machine Learning Fundamentals and Applications 9 th Set of Notes

SUPPORT VECTOR MACHINES

Non-Parametric Modeling

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Support Vector Machines.

Support Vector Machines

Assignment 2. Classification and Regression using Linear Networks, Multilayer Perceptron Networks, and Radial Basis Functions

Generative and discriminative classification techniques

Support Vector Machines

Lecture 9: Support Vector Machines

Data Analysis 3. Support Vector Machines. Jan Platoš October 30, 2017

.. Spring 2017 CSC 566 Advanced Data Mining Alexander Dekhtyar..

SUPPORT VECTOR MACHINES

Linear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines

Support Vector Machines

LARGE MARGIN CLASSIFIERS

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

Support Vector Machines.

The Curse of Dimensionality

Large Margin Classification Using the Perceptron Algorithm

Information Management course

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

CPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016

Lecture 20: Neural Networks for NLP. Zubin Pahuja

Machine Learning. Chao Lan

Large-Scale Lasso and Elastic-Net Regularized Generalized Linear Models

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

Multi-Class Logistic Regression and Perceptron

Stat 602X Exam 2 Spring 2011

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Learning via Optimization

More on Classification: Support Vector Machine

6.034 Quiz 2, Spring 2005

Perceptron: This is convolution!

Neural Network Learning. Today s Lecture. Continuation of Neural Networks. Artificial Neural Networks. Lecture 24: Learning 3. Victor R.

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

Kernel Methods & Support Vector Machines

Applying Supervised Learning

Mixture Models and the EM Algorithm

Last week. Multi-Frame Structure from Motion: Multi-View Stereo. Unknown camera viewpoints

CSEP 573: Artificial Intelligence

DM6 Support Vector Machines

Neural Networks (Overview) Prof. Richard Zanibbi

3 Perceptron Learning; Maximum Margin Classifiers

Preface to the Second Edition. Preface to the First Edition. 1 Introduction 1

Statistical Methods in AI

Lecture 9. Support Vector Machines

Lecture 7: Support Vector Machine

Pattern Recognition. Kjell Elenius. Speech, Music and Hearing KTH. March 29, 2007 Speech recognition

Logistic Regression

Machine Learning: Think Big and Parallel

Learning and Generalization in Single Layer Perceptrons

A Distance-Based Classifier Using Dissimilarity Based on Class Conditional Probability and Within-Class Variation. Kwanyong Lee 1 and Hyeyoung Park 2

PATTERN CLASSIFICATION AND SCENE ANALYSIS

Discriminate Analysis

I How does the formulation (5) serve the purpose of the composite parameterization

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Support vector machines. Dominik Wisniewski Wojciech Wawrzyniak

Lecture 3: Linear Classification

Lecture Linear Support Vector Machines

Application of Support Vector Machine Algorithm in Spam Filtering

Module 1 Lecture Notes 2. Optimization Problem and Model Formulation

Experimental Data and Training

Note Set 4: Finite Mixture Models and the EM Algorithm

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification

Conflict Graphs for Parallel Stochastic Gradient Descent

STA 4273H: Statistical Machine Learning

The Perceptron. Simon Šuster, University of Groningen. Course Learning from data November 18, 2013

PV211: Introduction to Information Retrieval

STA 4273H: Sta-s-cal Machine Learning

Transcription:

Classification: Linear Discriminant Functions CE-725: Statistical Pattern Recognition Sharif University of Technology Spring 2013 Soleymani

Outline Discriminant functions Linear Discriminant functions Linear Discriminant Function Least Mean Squared Error Method Sum of Squared Error Method Perceptron Multi-class problems Linear machine Completely Linearly Separable Pairwise Linearly Separable Generalized LDFs 2

Classification Problem Given:Training set labeled set of input-output pairs =, {1,,} Goal: Given an input, assign it to one of classes 3

Types of Classifiers Probabilistic classification approaches (previous lectures): Generative Discriminative Discriminant function Various procedures for determining discriminant functions (some of them are statistical) However, they don t require knowledge of the forms of underlying probability distributions 4

Discriminant Functions Discriminant functions: A popular way of representing a classifier A discriminant function for each class (=1,,): is assigned to class if: () > () Decision surfaces (boundaries) can also be found using discriminant functions Boundary of the R and R :, = () 5

Probabilistic Discriminant Functions Maximum likelihood () = ( ) Bayesian Classifier () = ( ) Expected Loss (Conditional Risk) () = ( ) 6

Discriminant Functions: Two-Category For two-category problem, we can only find a function R R = () = () Decision surface: = 0 First, we explain two-category classification problem and then discuss the multi-category problems. 7

Linear Discriminant Functions Linear Discriminant Functions (LDFs) Decision boundaries are linear in, or linear in some given set of functions of Why LDFs? They can be optimal for some problems E.g., if the underlying distributions ( ) are gaussians having equal covariance Even when they are not optimal, we can use their simplicity LDFs are relatively easy to compute In the absence of information suggesting otherwise, linear classifiers are an attractive candidates for initial, trial classifiers. 8

LDFs: Two Category = 1 =[ ] : bias contains the parameters we need to set if then else Decision surface (boundary): The equation (; ) = 0 defines the decision surface separating samples of the two categories Augmented When (; ) is linear, the decision surface is a hyperplane. 9

LDFs : Two Category Decision boundary is a ( 1)-dimensional hyperplane in the -dimensional feature space The orientation of is determined by the normal vector,, is the location of the surface is determined by the bias. = + = = () () is proportional to the signed distance from to 10

LDFs: Two Category 3+ 3 4 + =0 2 3 2 1 if 0 else 1 2 3 4 1 = [3, 0.75, 1] =[1,, ] 11

LDFs: Cost Function Finding LDFs is formulated as an optimization problem A cost function is needed and a procedure is used to solve it. Criterion or cost functions for classification: Average training error or loss incurred in classifying training samples A small training error does not guarantee a small test error We will investigate several cost functions for the classification problem 12

LDFs: Methods Many classification methods are based on LDFs: Mean Squared Error Sum of Squared Error Perceptorn Fisher Linear Discriminant Analysis (LDA) [next lectures] SVM [next lecture] 13

Main Steps in Methods based on LDFs We have specified the class of discriminant functions as linear Select how to measure the prediction loss Based on the training set =,, a cost function is defined (e.g., SSE) that is a function of the classifier parameters Solve the resulting optimization problem to find parameters: Find optimal = ; where =argmin 14

Bayes vs. LDFs Cost Function Bayes minimum error classifier: (.), If we know the probabilities in advance then the above optimization problem will be solved easily. = argmax ( ) We only have a set of training samples instead of and usually optimize the following problem: () (.) 15

Mean Squared Error Squared Loss Two-category: { 1,1} = 1for, =1for 16

Sum of Squared Error (SSE) Cost function: Prediction errors on the training set: empirical loss on training samples 17

Sum of Squared Error (SSE) () () () () () () () () Pseudo Inverse of 18

Sum of Squared Error (SSE) SSE penalizes too correct predictions too correct predictions: samples lie a long way on the correct side of the decision =1 It also lack robustness to noise Correct predictions that are penalized by SSE 1 = 1 19 [Bishop] 1

Perceptron Two-category: = 1for, =1for Goal: () () = 1, <0 1, 0 20

Perceptron Criterion Only misclassified training samples affect the discriminant functions: M : subset of training data that are misclassified Many solutions? Which solution among them? 21

Perceptron vs. Other Criteria () () Misclassification Perceptron 22 [Duda,Hart&Stork]

Some classification criteria () () Perceptron SSE 23 [Duda,Hart&Stork]

Batch Perceptron Gradient Descent to solve the optimization problem: M Batch Perceptron converges in finite number of steps for linearly separable data Initialize, 0 Repeat =+ +1 Until M M < 24

Single-Sample Perceptron If () is misclassified: () () can be set to 1 Fixed-Increment Single Sample Perceptron Initialize, 0 repeat +1 mod if () is misclassified then =+ () () Until all patterns properly classified 25

Convergence of Perceptron Change in a direction that corrects the error 26 [Bishop]

Convergence of Perceptron If the training data set is linearly separable, the single-sample perceptron algorithm is also guaranteed to find a solution in a finite number of steps 27 [Duda,Hart&Stork]

2 LDFs: Multi-Category Linear discriminant functions for multi-category problems: Linear machine : A discriminant function () for each class Converting the problem to a set of two-class problems: one versus rest or one against all For each class, an LDF separates samples of from all the other samples. Totally linearly separable one versus one ( 1)/2 LDFs are used, one to separate samples of a pair of classes. Pairwise linearly separable 1 Converting the problem to a set of two-class problems can lead to regions in which the classification is undefined. 28

Multi-Category Classification One-vs-all (one-vs-rest) 2 1 2 2 29 Class 1: Class 2: Class 3: 1 2 1 1

Multi-Category Classification One-vs-one 2 1 2 2 1 30 Class 1: Class 2: Class 3: 1 2 1

Multi-Category Classification: Ambiguity one versus rest one versus one 31 [Duda,Hart&Stork]

Multi-Category Classification: Linear Machine A discriminant function for each class ( ): is assigned to class if: () > () Decision surfaces (boundaries) can also be found using discriminant functions Boundary of the contiguous R and R :, = () + =0 32

Multi-Category Classification: Linear Machine 33 [Duda,Hart&Stork]

Multi-Category Classification: Linear Machine Decision regions are convex Linear machines are most suitable for problems where ( ) are unimodal., R, is linear +(1 ) +(1 ) + 1 + 1 + 1 R Convex region definition:, R, 0 1 + 1 R 34

Multi-Category Classification: Target Coding Scheme Target values: Binary classification: a target variable 0,1 Multiple classes ( >2): TargetClass : =1 =0 35

SSE Cost Function: Multi-Class () () () () () () 36

SSE Cost Function: Multi-Class Low performance of the SSE cost function for the classification problem 37 [Bishop]

Perceptron: Multi-Class,, M : subset of training data that are misclassified () 38 Initialize =,,, 0 repeat +1 mod if () is misclassified then = () = + () Until all patterns properly classified

Generalized LDFs Linear combination of a fixed non-linear functions of the input vector : set of basis functions (or features) 39 We will discuss about them in the next lectures.

Generalized LDFs: Example Choose non-linear features Discriminant functions are still linear in parameters 2 boundary 1 + + =0 1 = [1,,,,, ] = [ 1, 0, 0,1,1,0] 1 1 1 1 if () 0 then =1 else = 1 =[, ] 40 We will discuss about them in the next lectures.