SYDE 372 Introduction to Pattern Recognition. Distance Measures for Pattern Classification: Part I

Similar documents
SYDE Winter 2011 Introduction to Pattern Recognition. Clustering

9881 Project Deliverable #1

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour

SD 372 Pattern Recognition

CS 229 Midterm Review

Intro to Artificial Intelligence

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

Lecture 3. Oct

CS6716 Pattern Recognition

Topics in Machine Learning

Generative and discriminative classification techniques

3D Object Recognition using Multiclass SVM-KNN

Non-Parametric Modeling

Support Vector Machines

Based on Raymond J. Mooney s slides

Artificial Intelligence. Programming Styles

Unsupervised Learning : Clustering

Neural Networks. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

Clustering. CE-717: Machine Learning Sharif University of Technology Spring Soleymani

Nearest Neighbor Methods

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Clustering & Classification (chapter 15)

Distribution-free Predictive Approaches

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

CSC411/2515 Tutorial: K-NN and Decision Tree

Generative and discriminative classification techniques

Nearest Neighbor Classification

Data Mining and Data Warehousing Classification-Lazy Learners

Machine Learning for. Artem Lind & Aleskandr Tkachenko

Pattern Recognition Chapter 3: Nearest Neighbour Algorithms

INF 4300 Classification III Anne Solberg The agenda today:

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

Kernels + K-Means Introduction to Machine Learning. Matt Gormley Lecture 29 April 25, 2018

Bioimage Informatics

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

Supervised vs unsupervised clustering

ECG782: Multidimensional Digital Signal Processing

Nearest Neighbor Classification

Pattern Recognition ( , RIT) Exercise 1 Solution

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

Introduction to Artificial Intelligence

8. Clustering: Pattern Classification by Distance Functions

Lecture 10: Semantic Segmentation and Clustering

PARALLEL CLASSIFICATION ALGORITHMS

CSCI567 Machine Learning (Fall 2014)

CHAPTER 4: CLUSTER ANALYSIS

Operators-Based on Second Derivative double derivative Laplacian operator Laplacian Operator Laplacian Of Gaussian (LOG) Operator LOG

What to come. There will be a few more topics we will cover on supervised learning

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Slides for Data Mining by I. H. Witten and E. Frank

Search Engines. Information Retrieval in Practice

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Robot Learning. There are generally three types of robot learning: Learning from data. Learning by demonstration. Reinforcement learning

University of Florida CISE department Gator Engineering. Clustering Part 5

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Decision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang

Machine Learning Lecture 3

k-nearest Neighbors + Model Selection

Large-scale visual recognition The bag-of-words representation

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Image Segmentation. Ross Whitaker SCI Institute, School of Computing University of Utah

Image Segmentation. Srikumar Ramalingam School of Computing University of Utah. Slides borrowed from Ross Whitaker

Chapter 6: Cluster Analysis

Support Vector Machines

Machine Learning Lecture 3

Machine Learning Lecture 3

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017

Learning to Learn: additional notes

Density estimation. In density estimation problems, we are given a random from an unknown density. Our objective is to estimate

VECTOR SPACE CLASSIFICATION

Support Vector Machines + Classification for IR

CS 195-5: Machine Learning Problem Set 5

Heeyoul (Henry) Choi. Dept. of Computer Science Texas A&M University

Machine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Network Traffic Measurements and Analysis

Instance-based Learning

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

CS 664 Segmentation. Daniel Huttenlocher

LOGISTIC REGRESSION FOR MULTIPLE CLASSES

9 Classification: KNN and SVM

CSC 411: Lecture 05: Nearest Neighbors

All lecture slides will be available at CSC2515_Winter15.html

CSE 6242 A / CX 4242 DVA. March 6, Dimension Reduction. Guest Lecturer: Jaegul Choo

Problem 1: Complexity of Update Rules for Logistic Regression

CS 2750 Machine Learning. Lecture 19. Clustering. CS 2750 Machine Learning. Clustering. Groups together similar instances in the data sample

SVM-KNN : Discriminative Nearest Neighbor Classification for Visual Category Recognition

Region-based Segmentation

Classification and K-Nearest Neighbors

Forestry Applied Multivariate Statistics. Cluster Analysis

Applying Supervised Learning

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Biometrics Technology: Image Processing & Pattern Recognition (by Dr. Dickson Tong)

Classification and Detection in Images. D.A. Forsyth

Transcription:

SYDE 372 Introduction to Pattern Recognition Distance Measures for Pattern Classification: Part I Alexander Wong Department of Systems Design Engineering University of Waterloo

Outline Distance Measures for Pattern Classification 1 Distance Measures for Pattern Classification 2 3

Distance measures for pattern classification Intuitively, two patterns that are sufficiently similar should be assigned to the same class. But what does similar mean? How similar are these patterns quantitatively? How similar are they to a particular class quantitatively? Since we represent patterns quantitatively as vectors in a feature space, it is often possible to: use some measure of similarity between two patterns to quantify how similar their attributes are use some measure of similarity between a pattern and a prototype to quantify how similar it is with a class

Minimum Euclidean Distance (MED) Classifier Definition: for all l k, where x c k iff d E (x, z k ) < d E (x, z l ) (1) d E (x, z k ) = [(x z k ) T (x z k )] 1/2 (2) Meaning: x belongs to class k if and only if the Euclidean distance between x and the prototype of c k is less than the distance between x and all other class prototypes.

MED Classifier: Visualization

MED Classifier: Discriminant Function Simplifying the decision criteria d E (x, z k ) < d E (x, z l ) gives us: z 1 T x + 1 2 z 1 T z 1 < z 2 T x + 1 2 z 2 T z 2 (3) This gives us the discrimination/decision function: g k (x) = z k T x + 1 2 z k T z k (4) Therefore, MED classification made based on minimum discriminant for given x: x c k iff g k (x) < g l (x) l k (5)

MED Classifier: Decision Boundary Formed by features equidistant to two classes (g k (x) = g l (x)): g(x) = g k (x) g l (x) = 0 (6) For MED classifier, the decision boundary becomes: g(x) = (z k z l ) T x + 1 2 (z l T z l z k T z k ) = 0 (7) g(x) = w T x + w o = 0 (8)

MED Classifier: Decision Boundary Visualization The MED decision boundary is just a hyperplane with normal vector w, a distance wo from the origin w

So far, we ve assumed that we have a specific prototype z i for each class. But how do we select such a class prototype? Choice of class prototype will affect the way the classifier works Let us study the classification problem where: We have a set of samples with known classes c k We need to determine a class prototype based on these labeled samples

Common prototypes: Sample Mean N k z k (x) = 1 x N i (9) k where N k is the number of samples in class c k and x i is the i th sample of c k. i=1

Common prototypes: Sample Mean Advantages: + Less sensitive to noise and outliers Disadvantages: - Poor at handling long, thin, tendril-like clusters

Common prototypes: Nearest Neighbor (NN) Definition: z k (x) = x k such that d E (x, x k ) = min i d E (x, x i ) x i c k. (10) Meaning: For a given x you wish to classify, you compute the distance between x and all labeled samples, and you assign x the same class as its nearest neighbor.

Common prototypes: Nearest Neighbor (NN)

Common prototypes: Nearest Neighbor (NN) Advantages: + Better at handling long, thin, tendril-like clusters Disadvantages: - More sensitive to noise and outliers - Computationally complex (need to re-compute all prototypes for each new point)

Common prototypes: Furthest Neighbor (FNN) Definition: z k (x) = x k such that d E (x, x k ) = max i d E (x, x i ) x i c k. (11) Meaning: For a given x you wish to classify, you compute the distance between x and all labeled samples, and you define the prototype in each cluster as that point furthest from x.

Common prototypes: Furthest Neighbor (FNN)

Common prototypes: Furthest Neighbor (FNN) Advantages: + More tight, compact clusters Disadvantages: - More sensitive to noise and outliers - Computationally complex (need to re-compute all prototypes for each new point)

Common prototypes: K-nearest Neighbor Idea: Nearest neighbor is sensitive to noise, but handles long, tendril-like clusters well Sample mean is less sensitive to noise, but poorly handles long, tendril-like clusters What if we combine the two ideas? Definition: For a given x you wish to classify, you compute the distance between x and all labeled samples, and you define the prototype in each cluster as the samplemean of the K samples within that cluster that is nearest x.

Common prototypes: K-nearest Neighbor

Common prototypes: K-nearest Neighbor Advantages: + Less sensitive to noise and outliers + Better at handling long, thin, tendril-like clusters Disadvantages: - Computationally complex (need to re-compute all prototypes for each new point)

Example: Performance from using different prototypes Features are Gaussian in nature, different means, uncorrelated, equal variant:

Example: Performance from using different prototypes Euclidean distance from sample mean for class A:

Example: Performance from using different prototypes Euclidean distance from sample mean for class B:

Example: Performance from using different prototypes MED decision boundary

Example: Performance from using different prototypes Features are Gaussian in nature, different means, uncorrelated, equal variant:

Example: Performance from using different prototypes Features are Gaussian in nature, different means, uncorrelated, different variances:

Example: Performance from using different prototypes MED decision boundary:

Example: Performance from using different prototypes NN decision boundary:

MED Classifier: Example Suppose we are given the following labeled samples: Class 1: x 1 = [2 1] T, x 2 = [3 2] T, x 3 = [2 7] T, x 4 = [5 2] T. Class 2: x 1 = [3 3] T, x 2 = [4 4] T, x 3 = [3 9] T, x 4 = [6 4] T. Suppose we wish to build a MED classifier using sample means as prototypes. Compute the discriminate function for each class. Sketch the decision boundary.

MED Classifier: Example Step 1: Find sample mean prototypes for each class: z 1 = 1 N1 N 1 i=1 x i z 1 = 1 { 4 [2 1] T + [3 2] T + [2 7] T + [5 2] T } z 1 = 1 { 4 [12 12] T } z 1 = [3 3] T. (12) z 2 = 1 N2 N 2 i=1 x i z 2 = 1 { 4 [3 3] T + [4 4] T + [3 9] T + [6 4] T } z 1 = 1 { 4 [16 20] T } z 2 = [4 5] T. (13)

MED Classifier: Example Step 2: Find discriminant functions for each class based on MED decision rule: Recall that the MED decision criteria for the two class case is: d E (x, z 1 ) < d E (x, z 2 ) (14) [(x z 1 ) T (x z 1 )] 1/2 < [(x z 2 ) T (x z 2 )] 1/2 (15) (x z 1 ) T (x z 1 ) < (x z 2 ) T (x z 2 ) (16) z 1 T x + 1 2 z 1 T z 1 < z 2 T x + 1 2 z 2 T z 2 (17)

MED Classifier: Example Plugging in z 1 and z 2 gives us: z 1 T x + 1 2 z 1 T z 1 < z 2 T x + 1 2 z 2 T z 2 (18) [3 3] T T [x 1 x 2 ] T + 1 2 [3 3]T T [3 3] T < [4 5] T T [x 1 x 2 ] T + 1 2 [4 5]T T [4 5] T (19) [3 3][x 1 x 2 ] T + 1 2 [3 3][3 3]T < [4 5][x 1 x 2 ] T + 1 [4 5][4 5]T 2 (20)

MED Classifier: Example Plugging in z 1 and z 2 gives us: 3x 1 3x 2 + 9 < 4x 1 5x 2 + 41 2 (21) Therefore, the discriminant functions are: g 1 (x 1, x 2 ) = 3x 1 3x 2 + 9 (22) g 2 (x 1, x 2 ) = 4x 1 5x 2 + 41 2 (23)

MED Classifier: Decision Boundary Step 3: Find decision boundary between classes 1 and 2 For MED classifier, the decision boundary is g(x) = (z k z l ) T x + 1 2 (z l T z l z T k z k ) = 0 (24) g(x 1, x 2 ) = g 1 (x 1, x 2 ) g 2 (x 1, x 2 ) = 0. (25) Plugging in the discriminant functions g 1 and g 2 gives us: g(x 1, x 2 ) = 3x 1 3x 2 + 9 ( 4x 1 5x 2 + 41 2 ) = 0 (26)

MED Classifier: Decision Boundary Grouping terms: 3x 1 3x 2 + 9 + 4x 1 + 5x 2 41 2 = 0 (27) 2x 2 + x 1 23 2 = 0 (28) x 2 = 1 2 x 1 + 23 4 (29) Therefore, the decision boundary is just a straight line with a slope of 1 2 and an offset of 23 4.

MED Classifier: Decision Boundary Step 4: Sketch decision boundary