Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions

Similar documents
Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

Chapter 4: Non-Parametric Techniques

Unsupervised Learning and Clustering

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Nearest Neighbor Classification. Machine Learning Fall 2017

Unsupervised Learning and Clustering

Nonparametric Classification. Prof. Richard Zanibbi

Introduction to Clustering

Supervised Learning: Nearest Neighbors

Probabilistic Graphical Models Part III: Example Applications

CS 340 Lec. 4: K-Nearest Neighbors

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Nearest Neighbor Methods

Nearest Neighbor Search by Branch and Bound

Learning to Learn: additional notes

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

9881 Project Deliverable #1

Non-Parametric Modeling

Geometric data structures:

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Pattern Recognition ( , RIT) Exercise 1 Solution

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Textural Features for Image Database Retrieval

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

Data Mining and Machine Learning: Techniques and Algorithms

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

CSC 411: Lecture 05: Nearest Neighbors

Clustering. Content. Typical Applications. Clustering: Unsupervised data mining technique

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

Structural and Syntactic Pattern Recognition

Clustering. K-means clustering

Based on Raymond J. Mooney s slides

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Hard clustering. Each object is assigned to one and only one cluster. Hierarchical clustering is usually hard. Soft (fuzzy) clustering

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Gene Clustering & Classification

Document Clustering: Comparison of Similarity Measures

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

Unsupervised Learning

CISC 4631 Data Mining

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Image Processing. David Kauchak cs160 Fall Empirical Evaluation of Dissimilarity Measures for Color and Texture

INF 4300 Classification III Anne Solberg The agenda today:

Lecture 24: Image Retrieval: Part II. Visual Computing Systems CMU , Fall 2013

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Geometric Computations for Simulation

Data Mining Chapter 9: Descriptive Modeling Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Unsupervised Learning

Exploratory Data Analysis using Self-Organizing Maps. Madhumanti Ray

Statistical Methods in AI

Chapter DM:II. II. Cluster Analysis

Hierarchical Clustering 4/5/17

Nearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017

Nearest Neighbor Classification

Branch and Bound. Algorithms for Nearest Neighbor Search: Lecture 1. Yury Lifshits

Data Clustering Algorithms

Metric Learning for Large-Scale Image Classification:

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Image Segmentation. Selim Aksoy. Bilkent University

Big Data Infrastructure CS 489/698 Big Data Infrastructure (Winter 2017)

Image Segmentation. Selim Aksoy. Bilkent University

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

Nearest Neighbors Classifiers

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

Data Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\

Getting to Know Your Data

MUSI-6201 Computational Music Analysis

Overview. 1 Preliminaries. 2 k-center Clustering. 3 The Greedy Clustering Algorithm. 4 The Greedy permutation. 5 k-median clustering

Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Lecture 6: Multimedia Information Retrieval Dr. Jian Zhang

Going nonparametric: Nearest neighbor methods for regression and classification

Deep Learning for Computer Vision

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Nearest neighbor classification DSE 220

Conjectures concerning the geometry of 2-point Centroidal Voronoi Tessellations

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

26 The closest pair problem

Delaunay Triangulation

Nearest neighbor classifiers

Discrete geometry. Lecture 2. Alexander & Michael Bronstein tosca.cs.technion.ac.il/book

Lecture: Segmentation I FMAN30: Medical Image Analysis. Anders Heyden

In what follows, we will focus on Voronoi diagrams in Euclidean space. Later, we will generalize to other distance spaces.

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

CS4495/6495 Introduction to Computer Vision. 8C-L1 Classification: Discriminative models

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Lecture 4: Geometric Algorithms (Convex Hull, Voronoi diagrams)

Unsupervised Learning

Distribution-free Predictive Approaches

Metric Learning for Large Scale Image Classification:

CPSC 340: Machine Learning and Data Mining. Density-Based Clustering Fall 2016

Lecture 2 The k-means clustering problem

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

CS 229 Midterm Review

Machine Learning for OR & FE

Tree Models of Similarity and Association. Clustering and Classification Lecture 5

Transcription:

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2017 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 1 / 13

Non-Bayesian Classifiers We have been using Bayesian classifiers that make decisions according to the posterior probabilities. We have discussed parametric and non-parametric methods for learning classifiers by estimating the probabilities using training data. We will study new techniques that use training data to learn the classifiers directly without estimating any probabilistic structure. In particular, we will study the k-nearest neighbor classifier, linear discriminant functions, and support vector machines. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 2 / 13

The Nearest Neighbor Classifier Given the training data D = {x 1,..., x n } as a set of n labeled examples, the nearest neighbor classifier assigns a test point x the label associated with its closest neighbor in D. Closeness is defined using a distance function. Given the distance function, the nearest neighbor classifier partitions the feature space into cells consisting of all points closer to a given training point than to any other training points. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 3 / 13

The Nearest Neighbor Classifier All points in such a cell are labeled by the class of the training point, forming a Voronoi tesselation of the feature space. Figure 1: In two dimensions, the nearest neighbor algorithm leads to a partitioning of the input space into Voronoi cells, each labeled by the class of the training point it contains. In three dimensions, the cells are three-dimensional, and the decision boundary resembles the surface of a crystal. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 4 / 13

The k-nearest Neighbor Classifier The k-nearest neighbor classifier classifies x by assigning it the label most frequently represented among the k nearest samples. In other words, a decision is made by examining the labels on the k-nearest neighbors and taking a vote. Figure 2: The k-nearest neighbor query forms a spherical region around the test point x until it encloses k training samples, and it labels the test point by a majority vote of these samples. In the case for k = 5, the test point will be labeled as black. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 5 / 13

The k-nearest Neighbor Classifier The computational complexity of the nearest neighbor algorithm both in space (storage) and time (search) has received a great deal of analysis. In the most straightforward approach, we inspect each stored training point one by one, calculate its distance to x, and keep a list of the k closest ones. There are some parallel implementations and algorithmic techniques for reducing the computational load in nearest neighbor searches. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 6 / 13

The k-nearest Neighbor Classifier Examples of algorithmic techniques include computing partial distances using a subset of dimensions, and eliminating the points with partial distances greater than the full distance of the current closest points, using search trees that are hierarchically structured so that only a subset of the training points are considered during search, editing the training set by eliminating the points that are surrounded by other training points with the same class label. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 7 / 13

Distance Functions The nearest neighbor classifier relies on a metric or a distance function between points. For all points x, y and z, a metric D(, ) must satisfy the following properties: Nonnegativity: D(x, y) 0. Reflexivity: D(x, y) = 0 if and only if x = y. Symmetry: D(x, y) = D(y, x). Triangle inequality: D(x, y) + D(y, z) D(x, z). If the second property is not satisfied, D(, ) is called a pseudometric. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 8 / 13

Distance Functions A general class of metrics for d-dimensional patterns is the Minkowski metric ( d ) 1/p L p (x, y) = x i y i p i=1 also referred to as the L p norm. The Euclidean distance is the L 2 norm ( d ) 1/2 L 2 (x, y) = x i y i 2. i=1 The Manhattan or city block distance is the L 1 norm d L 1 (x, y) = x i y i. i=1 CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 9 / 13

Distance Functions The L norm is the maximum of the distances along individual coordinate axes L (x, y) = d max i=1 x i y i. Figure 3: Each colored shape consists of points at a distance 1.0 from the origin, measured using different values of p in the Minkowski L p metric. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 10 / 13

Feature Normalization We should be careful about scaling of the coordinate axes when we compute these metrics. When there is great difference in the range of the data along different axes in a multidimensional space, these metrics implicitly assign more weighting to features with large ranges than those with small ranges. Feature normalization can be used to approximately equalize ranges of the features and make them have approximately the same effect in the distance computation. The following methods can be used to independently normalize each feature. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 11 / 13

Feature Normalization Linear scaling to unit range: Given a lower bound l and an upper bound u for a feature x R, x = x l u l results in x being in the [0, 1] range. Linear scaling to unit variance: A feature x R can be transformed to a random variable with zero mean and unit variance as x = x µ σ where µ and σ are the sample mean and the sample standard deviation of that feature, respectively. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 12 / 13

Feature Normalization Normalization using the cumulative distribution function: Given a random variable x R with cumulative distribution function F x (x), the random variable x resulting from the transformation x = F x (x) will be uniformly distributed in [0, 1]. Rank normalization: Given the sample for a feature as x 1,..., x n R, first we find the order statistics x (1),..., x (n) and then replace each pattern s feature value by its corresponding normalized rank as rank (x i ) 1 x 1,...,x n x i = n 1 where x i is the feature value for the i th pattern. This procedure uniformly maps all feature values to the [0, 1] range. CS 551, Fall 2017 c 2017, Selim Aksoy (Bilkent University) 13 / 13