Machine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Similar documents
Based on Raymond J. Mooney s slides

Instance-based Learning

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

CSC 411: Lecture 05: Nearest Neighbors

Perceptron as a graph

CISC 4631 Data Mining

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

9 Classification: KNN and SVM

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

Data Mining. Lecture 03: Nearest Neighbor Learning

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop

Chapter 4: Non-Parametric Techniques

Problem 1: Complexity of Update Rules for Logistic Regression

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

VECTOR SPACE CLASSIFICATION

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

CS 340 Lec. 4: K-Nearest Neighbors

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Supervised vs unsupervised clustering

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Supervised Learning: Nearest Neighbors

Instance Based Learning. k-nearest Neighbor. Locally weighted regression. Radial basis functions. Case-based reasoning. Lazy and eager learning

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

Statistical Pattern Recognition

Instance-based Learning

Going nonparametric: Nearest neighbor methods for regression and classification

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Data Mining and Machine Learning: Techniques and Algorithms

Machine Learning and Pervasive Computing

Nearest Neighbor Predictors

Classification Algorithms in Data Mining

SUPERVISED LEARNING METHODS. Stanley Liang, PhD Candidate, Lassonde School of Engineering, York University Helix Science Engagement Programs 2018

Instance-Based Learning.

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

CS 1675 Introduction to Machine Learning Lecture 18. Clustering. Clustering. Groups together similar instances in the data sample

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

Slides for Data Mining by I. H. Witten and E. Frank

Going nonparametric: Nearest neighbor methods for regression and classification

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

Applying Supervised Learning

Machine Learning. Chao Lan

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

Data Preprocessing. Javier Béjar. URL - Spring 2018 CS - MAI 1/78 BY: $\

Nearest Neighbor Classification. Machine Learning Fall 2017

Nonparametric Classification. Prof. Richard Zanibbi

Learning to Learn: additional notes

ECE 5424: Introduction to Machine Learning

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Machine Learning Lecture 3

Spatial Outlier Detection

Machine Learning Lecture 3

Transformations Review

CS 584 Data Mining. Classification 1

INF 4300 Classification III Anne Solberg The agenda today:

Machine Learning / Jan 27, 2010

Last time... Coryn Bailer-Jones. check and if appropriate remove outliers, errors etc. linear regression

Clustering. Mihaela van der Schaar. January 27, Department of Engineering Science University of Oxford

Visuelle Perzeption für Mensch- Maschine Schnittstellen

Recap: Gaussian (or Normal) Distribution. Recap: Minimizing the Expected Loss. Topics of This Lecture. Recap: Maximum Likelihood Approach

CS6716 Pattern Recognition

Data Preprocessing. Javier Béjar AMLT /2017 CS - MAI. (CS - MAI) Data Preprocessing AMLT / / 71 BY: $\

Nearest Neighbor with KD Trees

Instance and case-based reasoning

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Distribution-free Predictive Approaches

Generative and discriminative classification techniques

Machine Learning Techniques for Data Mining

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Statistical Pattern Recognition

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

Nearest Neighbor with KD Trees

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

Basis Functions. Volker Tresp Summer 2017

Machine Learning Lecture 3

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

Content-based image and video analysis. Machine learning

k-nearest Neighbors + Model Selection

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Network Traffic Measurements and Analysis

The Curse of Dimensionality

Statistical Pattern Recognition

CPSC 340: Machine Learning and Data Mining

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

Notes and Announcements

CSE 573: Artificial Intelligence Autumn 2010

Algorithms for Recognition of Low Quality Iris Images. Li Peng Xie University of Ottawa

The exam is closed book, closed notes except your one-page cheat sheet.

Introduction to Machine Learning. Xiaojin Zhu

Instance-Based Learning. Goals for the lecture

Data Mining Practical Machine Learning Tools and Techniques

Nearest Neighbor Classification

Machine Learning. Classification

Support Vector Machines.

Transcription:

Machine Learning: k-nearest Neighbors Lecture 08 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu

Nonparametric Methods: k-nearest Neighbors Input: A training dataset (x 1, t 1 ), (x 2, t 2 ), (x n, t n ). A test instance x. Output: Estimated class label y(x). 1. Find k instances x 1, x 2,, x k nearest to x. 2. Let tît å y( x) = arg max d ( t where d ( x) t ì1 = í î0 k i= 1 t x = t x ¹ t i ) is the Kronecker delta function. 2

k-nearest Neighbors (k-nn) Euclidean distance, k = 4 3

k-nearest Neighbors (k-nn) Euclidian distance, k = 1. Voronoi diagram decision boundary 4

k-nn for Classification: Probabilistic Justification Assume a dataset with N j points in class C j. Þ total number of points is N = å j N j Draw a sphere centered at x containing K points: sphere has volume V. sphere contains K j points from class C j. If V sufficiently small and K sufficiently large, we can estimate [2.5.1]: K j K N j p(x C j ) = p (x) = p( C j ) = N V NV N j K j Bayes theorem Þ p( C j x) = Þ choose class C j with most neighbors. K 5

Distance Metrics Euclidean distance: T d( x, y) = x - y = ( x - y) ( x - y) 2 Hamming distance: # of (discrete) features that have different values in x and y. Mahalanobis distance: T -1 d( x, y) = ( x - y) S ( x - y) scale-invariant metric that normalizes for variance. if S = I Þ Euclidean distance. (sample) covariance matrix if S = diag(s 1-2, s 2-2, s K -2 ) Þ normalized Euclidean distance. 6

Distance Metrics Cosine similarity: d( x, y) = 1- cos( x, y) = 1- T x y x y used for text and other high-dimensional data. Levenshtein distance (Edit distance): distance metric on strings (sequences of symbols). min. # of basic edit operations that can transform one string into the other (delete, insert, substitute). x = athens y = hints used in bioinformatics. Þ d(x,y) = 4 7

Efficient Indexing Linear searching for k-nearest neighbors is not efficient for large training sets: O(N) time complexity. For Euclidean distance use a kd-tree: instances stored at leaves of the tree. internal nodes branch on threshold test on individual features. expected time to find the nearest neighbor is O(log N) Indexing structures depend on distance function: inverted index for text retrieval with cosine similarity. 8

k-nn and The Curse of Dimensionality Standard metrics weigh each feature equally: Problematic when many features are irrelevant. One solution is to weigh each feature differently: Use measure indicating ability to discriminate between classes, such as: Information Gain, Chi-square Statistic Pearson Correlation, Signal to Noise Ration, T test. Stretch the axes: lengthen for relevant features, shorten for irrelevant features. Equivalent with Mahalanobis distance with diagonal covariance. 9

Distance-Weighted k-nn For any test point x, weight each of the k neighbors according to their distance from x. 1. Find k instances x 1, x 2,, x k nearest to x. 2. Let tît å y( x) = arg max wd ( t where wi = x - xi k i= 1-2 i t i ) measures the similarity between x and x i 10

Kernel-based Distance-Weighted NN For any test point x, weight all training instances according to their similarity with x. 1. Assume binary classification, T = {+1, -1}. 2. Compute weighted majority: N æ y( x) = signçå K( x, x è i= 1 ) i t i ö ø 11

Regression with k-nearest Neighbor Input: A training dataset (x 1, t 1 ), (x 2, t 2 ), (x n, t n ). A test instance x. Output: Estimated function value y(x). 1. Find k instances x 1, x 2,, x k nearest to x. 2. k 1 Let y( x) = åt i k i= 1 12

3 Datasets & Linear Interpolation [http://www.autonlab.org/tutorials/mbl08.pdf] Linear interpolation does not always lead to good models of the data. 13

Regression with 1-Nearest Neighbor 14

Regression with 1-Nearest Neighbor 15

Regression with 1-Nearest Neighbor Þ 1-NN has high variance 16

Regression with 9-Nearest Neighbor k = 1 k = 9 17

Regression with 9-Nearest Neighbor k = 1 k = 9 18

Regression with 9-Nearest Neighbor k = 1 k = 9 19

Distance-Weighted k-nn for Regression For any test point x, weight each of the k neighbors according to their similarity with x. 1. Find k instances x 1, x 2,, x k nearest to x. 2. Let y( x) = k å i= 1 w t i i k å i= 1 w i where wi = x - xi -2 For k = N Þ Shepard s method [Shepard, ACM 68]. 20

Kernel-based Distance Weighted NN Regression For any test point x, weight all training instances according to their similarity with x. 1. Return weighted average: y( x) N å i= 1 = N å i= 1 K( x, x i K( x, x ) t i ) i 21

NN Regression with Gaussian Kernel 2s 2 =10 2s 2 =20 2s 2 =80 K x-x - 2s ( x, xi) = e i 2 2 Increased kernel width means more influence from distant points. 22

NN Regression with Gaussian Kernel 2s 2 =1/16 of x axis 2s 2 =1/32 of x axis 2s 2 =1/32 of x axis K x-x - 2s ( x, xi) = e i 2 2 23

k-nearest Neighbor Summary Training: memorize the training examples. Testing: compute distance/similarity with training examples. Trades decreased training time for increased test time. Use kernel trick to work in implicit high dimensional space. Needs feature selection when many irrelevant features. An Instance-Based Learning (IBL) algorithm: Memory-based learning Lazy learning Exemplar-based Case-based 24