Supervised Learning: Nearest Neighbors

Similar documents
K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Instance-based Learning

ECE 5424: Introduction to Machine Learning

Perceptron as a graph

Recognition Tools: Support Vector Machines

Instance-based Learning

Discriminative classifiers for image recognition

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

CISC 4631 Data Mining

Nearest Neighbor Classification. Machine Learning Fall 2017

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Data Mining. Lecture 03: Nearest Neighbor Learning

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

Going nonparametric: Nearest neighbor methods for regression and classification

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions

Nearest Neighbor Methods

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

Data Mining and Machine Learning: Techniques and Algorithms

Supervised Learning: K-Nearest Neighbors and Decision Trees

Instance-Based Learning.

CS 340 Lec. 4: K-Nearest Neighbors

Problem 1: Complexity of Update Rules for Logistic Regression

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

CS 343: Artificial Intelligence

CSC 411: Lecture 05: Nearest Neighbors

Nearest Neighbor with KD Trees

Kernels and Clustering

Bias-Variance Trade-off (cont d) + Image Representations

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Instance-Based Learning. Goals for the lecture

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Going nonparametric: Nearest neighbor methods for regression and classification

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

Notes and Announcements

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Machine Learning and Pervasive Computing

Distribution-free Predictive Approaches

CS 2750: Machine Learning. Clustering. Prof. Adriana Kovashka University of Pittsburgh January 17, 2017

Geometric data structures:

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

Nearest Neighbor with KD Trees

Bias-Variance Trade-off + Other Models and Problems

Machine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Ensemble Methods, Decision Trees

Machine Learning for Signal Processing Clustering. Bhiksha Raj Class Oct 2016

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

Nonparametric Methods Recap

CS 5614: (Big) Data Management Systems. B. Aditya Prakash Lecture #19: Machine Learning 1

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

CS 343H: Honors AI. Lecture 23: Kernels and clustering 4/15/2014. Kristen Grauman UT Austin

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

k-nearest Neighbors + Model Selection

CS 584 Data Mining. Classification 1

Locality- Sensitive Hashing Random Projections for NN Search

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

Metric Learning for Large-Scale Image Classification:

Linear Regression and K-Nearest Neighbors 3/28/18

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Data Preprocessing. Supervised Learning

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

Deformable Part Models

CS5670: Computer Vision

Instance-based Learning

CS 1674: Intro to Computer Vision. Neural Networks. Prof. Adriana Kovashka University of Pittsburgh November 16, 2016

CSE 573: Artificial Intelligence Autumn 2010

Introduction to Machine Learning. Xiaojin Zhu

Data Mining and Data Warehousing Classification-Lazy Learners

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Classification: Feature Vectors

ECE 5424: Introduction to Machine Learning

Clustering. So far in the course. Clustering. Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. dist(x, y) = x y 2 2

Nearest neighbor classifiers

CS 8520: Artificial Intelligence

Nearest neighbor classification DSE 220

1 Case study of SVM (Rob)

Clustering. Subhransu Maji. CMPSCI 689: Machine Learning. 2 April April 2015

9 Classification: KNN and SVM

ECG782: Multidimensional Digital Signal Processing

Machine Learning: Algorithms and Applications Mockup Examination

Topics in Machine Learning

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

I211: Information infrastructure II

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Object recognition. Methods for classification and image representation

Machine Learning nearest neighbors classification. Luigi Cerulo Department of Science and Technology University of Sannio

CSC411/2515 Tutorial: K-NN and Decision Tree

Topic 1 Classification Alternatives

Nearest Neighbor Predictors

Transcription:

CS 2750: Machine Learning Supervised Learning: Nearest Neighbors Prof. Adriana Kovashka University of Pittsburgh February 1, 2016

Today: Supervised Learning Part I Basic formulation of the simplest classifier: K Nearest Neighbors Example uses Generalizing the distance metric and weighting neighbors differently Problems: The curse of dimensionality Picking K Approximation strategies

Key Idea Supervised learning: We want to learn to predict, for a new data point x, its label y (e.g. spam / not spam) Don t learn an explicit function F: X Y Keep all training data {X, Y} For a test example x, find the training example x i closest to it (e.g. using Euclidean distance) Then copy the target label y i as the label for x

Related Methods Instance-based methods Exemplar methods Memory-based methods Non-parametric methods

1-Nearest Neighbor Classifier Training examples from class 1 Test example Training examples from class 2 f(x) = label of the training example nearest to x All we need is a distance function for our inputs No training required! Slide credit: Lana Lazebnik

K-Nearest Neighbors Classifier For a new point, find the k closest points from training data (e.g. k=5) Labels of the k points vote to classify Black = negative Red = positive If query lands here, the 5 NN consist of 3 negatives and 2 positives, so we classify it as negative. Slide credit: David Lowe

1-nearest neighbor x x o o x + x o o o x o + x x x x2 o x1 Slide credit: Derek Hoiem

3-nearest neighbor x x o o x + x o o o x o + x x x x2 o x1 Slide credit: Derek Hoiem

5-nearest neighbor x x o o x + x o o o x o + x x x x2 o x1 What are the tradeoffs of having a too large k? Too small k? Slide credit: Derek Hoiem

Instance/Memory-based Learning Four things make a memory based learner: A distance metric How many nearby neighbors to look at? A weighting function (optional) How to fit with the local points? Slide credit: Carlos Guestrin

1-Nearest Neighbor Four things make a memory based learner: A distance metric Euclidean (and others) How many nearby neighbors to look at? 1 A weighting function (optional) Not used How to fit with the local points? Just predict the same output as the nearest neighbor Slide credit: Carlos Guestrin

k-nearest Neighbor Four things make a memory based learner: A distance metric Euclidean (and others) How many nearby neighbors to look at? k A weighting function (optional) Not used How to fit with the local points? Just predict the average output among the nearest neighbors Slide credit: Carlos Guestrin

Formal Definition Classification: Regression:

Example: Predict where this picture was taken Hays and Efros, IM2GPS: Estimating Geographic Information from a Single Image, CVPR 2008

Example: Predict where this picture was taken Hays and Efros, IM2GPS: Estimating Geographic Information from a Single Image, CVPR 2008

Example: Predict where this picture was taken Hays and Efros, IM2GPS: Estimating Geographic Information from a Single Image, CVPR 2008

6+ million geotagged photos by 109,788 photographers Hays and Efros, IM2GPS: Estimating Geographic Information from a Single Image, CVPR 2008

Scene Matches Hays and Efros, IM2GPS: Estimating Geographic Information from a Single Image, CVPR 2008

Hays and Efros, IM2GPS: Estimating Geographic Information from a Single Image, CVPR 2008

Scene Matches Hays and Efros, IM2GPS: Estimating Geographic Information from a Single Image, CVPR 2008

Hays and Efros, IM2GPS: Estimating Geographic Information from a Single Image, CVPR 2008

Scene Matches Hays and Efros, IM2GPS: Estimating Geographic Information from a Single Image, CVPR 2008

Hays and Efros, IM2GPS: Estimating Geographic Information from a Single Image, CVPR 2008

The Importance of Data Hays and Efros, IM2GPS: Estimating Geographic Information from a Single Image, CVPR 2008

4 + 9 = 13 1 + 1 + 1 = 3 1 + 4 = 5 1 + 9 + 4 = 14 0 + 0 + 0 = 0 Slide credit: Alexander Ihler

Slide credit: Alexander Ihler

Slide credit: Alexander Ihler

Slide credit: Alexander Ihler Latent Semantic Indexing (LSI) worked-out example: http://nlp.stanford.edu/ir-book/html/htmledition/latent-semantic-indexing-1.html

k-nearest Neighbor Four things make a memory based learner: A distance metric Euclidean (and others) How many nearby neighbors to look at? k A weighting function (optional) Not used How to fit with the local points? Just predict the average output among the nearest neighbors Slide credit: Carlos Guestrin

Distances Suppose I want to charge my overall distance more for differences in x 2 direction as opposed to x 1 direction Setup A: equal weighing on all directions Setup B: more weight on x 2 direction Will my neighborhoods be longer in the x 1 or x 2 direction?

Voronoi partitioning Nearest neighbor regions All points in a region are closer to the seed in that region than to any other seed (black dots = seeds) Figure from Wikipedia

Multivariate distance metrics Suppose the input vectors x 1, x 2, x N are two dimensional: x 1 = ( x 11, x 12 ), x 2 = ( x 21, x 22 ),, x N = ( x N1, x N2 ). Dist(x i,x j ) = (x i1 x j1 ) 2 + (x i2 x j2 ) 2 Dist(x i,x j ) =(x i1 x j1 ) 2 +(3x i2 3x j2 ) 2 The relative scalings in the distance metric affect region shapes Slide credit: Carlos Guestrin

Distance metrics Euclidean: Mahalanobis: Minkowski:

Distance metrics Euclidean Manhattan Figures from Wikipedia

Another generalization: Weighted K-NNs Neighbors weighted differently: Extremes Bandwidth = infinity: prediction is dataset average Bandwidth = zero: prediction becomes 1-NN

Kernel Regression/Classification Four things make a memory based learner: A distance metric Euclidean (and others) How many nearby neighbors to look at? All of them A weighting function (optional) w i = exp(-d(x i, query) 2 / σ 2 ) Nearby points to the query are weighted strongly, far points weakly. The σ parameter is the Kernel Width. Very important. How to fit with the local points? (Regression) Predict the weighted average of the outputs, i.e. Σw i y i / Σw i Slide credit: Carlos Guestrin

Problems with Instance-Based Learning Too many features? Doesn t work well if large number of irrelevant features, distances overwhelmed by noisy features Distances become meaningless in high dimensions (the curse of dimensionality) What is the impact of the value of K? Expensive No learning: most real work done during testing For every test sample, must search through all dataset very slow! Must use tricks like approximate nearest neighbor search Need to store all training data Adapted from Dhruv Batra

Curse of Dimensionality

Curse of Dimensionality Consider: Sphere of radius 1 in d-dims Consider: an outer ε-shell in this sphere shell volume What is? sphere volume Slide credit: Dhruv Batra

Figure 1.22 from Bishop Curse of Dimensionality

Curse of Dimensionality Problem: In very high dimensions, all points are equally close This problem applies to all types of classifiers, not just K-NN

Problems with Instance-Based Learning Too many features? Doesn t work well if large number of irrelevant features, distances overwhelmed by noisy features Distances become meaningless in high dimensions (the curse of dimensionality) What is the impact of the value of K? Expensive No learning: most real work done during testing For every test sample, must search through all dataset very slow! Must use tricks like approximate nearest neighbor search Need to store all training data Adapted from Dhruv Batra

Slide credit: Alexander Ihler simplifies / complicates

Slide credit: Alexander Ihler

Slide credit: Alexander Ihler

Slide credit: Alexander Ihler

Use a validation set to pick K Too complex Slide credit: Alexander Ihler

Problems with Instance-Based Learning Too many features? Doesn t work well if large number of irrelevant features, distances overwhelmed by noisy features Distances become meaningless in high dimensions (the curse of dimensionality) What is the impact of the value of K? Expensive No learning: most real work done during testing For every test sample, must search through all dataset very slow! Must use tricks like approximate nearest neighbor search Need to store all training data Adapted from Dhruv Batra

Approximate distance methods Build a balanced tree of data points (kd-trees), splitting along different dimensions Go down tree (starting from root) and at each point compute current best guess of nearest neighbor Intelligently eliminate parts of the search space if they cannot contain a better current best Only search for neighbors up until some budget exhausted

Figure from Szeliski

Summary K-Nearest Neighbor is the most basic and simplest to implement classifier Cheap at training time, expensive at test time Unlike other methods we ll see later, naturally works for any number of classes Pick K through a validation set, use approximate methods for finding neighbors Success of classification depends on the amount of data and the meaningfulness of the distance function