CSC 411: Lecture 05: Nearest Neighbors

Similar documents
CSC 411: Lecture 02: Linear Regression

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

CSC 411: Lecture 12: Clustering

Nearest Neighbor Classification. Machine Learning Fall 2017

Machine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Nearest Neighbor Predictors

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

Geometric data structures:

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Going nonparametric: Nearest neighbor methods for regression and classification

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

CSC 411: Lecture 14: Principal Components Analysis & Autoencoders

Data Mining and Data Warehousing Classification-Lazy Learners

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

The Boundary Graph Supervised Learning Algorithm for Regression and Classification

Distribution-free Predictive Approaches

Nearest Neighbor Methods

Supervised Learning: Nearest Neighbors

Instance-Based Learning.

VECTOR SPACE CLASSIFICATION

CSC 411 Lecture 4: Ensembles I

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Instance-Based Learning. Goals for the lecture

Classification and K-Nearest Neighbors

Going nonparametric: Nearest neighbor methods for regression and classification

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Linear Regression and K-Nearest Neighbors 3/28/18

Supervised vs unsupervised clustering

Decision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang

Lecture 3. Oct

MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour

Data Mining and Machine Learning: Techniques and Algorithms

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Notes and Announcements

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions

Data Mining. Lecture 03: Nearest Neighbor Learning

Basis Functions. Volker Tresp Summer 2017

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

CISC 4631 Data Mining

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

Problem 1: Complexity of Update Rules for Logistic Regression

Machine Learning Techniques for Data Mining

Locality- Sensitive Hashing Random Projections for NN Search

Instance-based Learning

Nearest Neighbor with KD Trees

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

Homework. Gaussian, Bishop 2.3 Non-parametric, Bishop 2.5 Linear regression Pod-cast lecture on-line. Next lectures:

Chapter 2 Basic Structure of High-Dimensional Spaces

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Non-Parametric Modeling

Nearest Neighbor with KD Trees

Supervised Learning: K-Nearest Neighbors and Decision Trees

Classifiers and Detection. D.A. Forsyth

Nearest neighbor classification DSE 220

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

9 Classification: KNN and SVM

Slides for Data Mining by I. H. Witten and E. Frank

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

CSC411/2515 Tutorial: K-NN and Decision Tree

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

Classification and Regression

Scalable Nearest Neighbor Algorithms for High Dimensional Data Marius Muja (UBC), David G. Lowe (Google) IEEE 2014

Computational Statistics and Mathematics for Cyber Security

CS5670: Computer Vision

Based on Raymond J. Mooney s slides

CS 340 Lec. 4: K-Nearest Neighbors

CS6716 Pattern Recognition

Data Mining and Machine Learning. Instance-Based Learning. Rote Learning k Nearest-Neighbor Classification. IBL and Rule Learning

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques.

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

ECG782: Multidimensional Digital Signal Processing

K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

CS 584 Data Mining. Classification 1

Introduction to machine learning, pattern recognition and statistical data modelling Coryn Bailer-Jones

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

Chapter 1. Introduction

Instance and case-based reasoning

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

k-nearest Neighbor (knn) Sept Youn-Hee Han

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

Parametrizing the easiness of machine learning problems. Sanjoy Dasgupta, UC San Diego

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

k-nearest Neighbors + Model Selection

Topics in Machine Learning

Perceptron as a graph

Transcription:

CSC 411: Lecture 05: Nearest Neighbors Raquel Urtasun & Rich Zemel University of Toronto Sep 28, 2015 Urtasun & Zemel (UofT) CSC 411: 05-Nearest Neighbors Sep 28, 2015 1 / 13

Today Non-parametric models distance non-linear decision boundaries Urtasun & Zemel (UofT) CSC 411: 05-Nearest Neighbors Sep 28, 2015 2 / 13

Classification: Oranges and Lemons Urtasun & Zemel (UofT) CSC 411: 05-Nearest Neighbors Sep 28, 2015 3 / 13

Classification: Oranges and Lemons Can$construct$simple$ linear$decision$ boundary:$$$$ $$$y$=$sign(w 0 $+$w 1 x 1 $$$$$ $$$$$$$$+$w 2 x 2 )$ Urtasun & Zemel (UofT) CSC 411: 05-Nearest Neighbors Sep 28, 2015 4 / 13

What is the meaning of linear classification Classification is intrinsically non-linear It puts non-identical things in the same class, so a difference in the input vector sometimes causes zero change in the answer Linear classification means that the part that adapts is linear (just like linear regression) z(x) = w T x + w 0 with adaptive w, w 0 The adaptive part is follow by a non-linearity to make the decision y(x) = f (z(x)) What f have we seen so far in class? Urtasun & Zemel (UofT) CSC 411: 05-Nearest Neighbors Sep 28, 2015 5 / 13

Classification as Induction Urtasun & Zemel (UofT) CSC 411: 05-Nearest Neighbors Sep 28, 2015 6 / 13

Instance-based Learning Alternative to parametric model is non-parametric Simple methods for approximating discrete-valued or real-valued target functions (classification or regression problems) Learning amounts to simply storing training data Test instances classified using similar training instances Embodies often sensible underlying assumptions: Output varies smoothly with input Data occupies sub-space of high-dimensional input space Urtasun & Zemel (UofT) CSC 411: 05-Nearest Neighbors Sep 28, 2015 7 / 13

Nearest Neighbors Assume training examples correspond to points in d-dimensional Euclidean space Target function value for new query estimated from known value of nearest training example(s) Distance typically defined to be Euclidean: x (a) x (b) 2 = d (x (a) j j=1 x (b) j ) 2 Algorithm 1. find example (x, t ) closest to the test instance x (q) 2. output y (q) = t Note: we don t need to compute the square root. Why? Urtasun & Zemel (UofT) CSC 411: 05-Nearest Neighbors Sep 28, 2015 8 / 13

Nearest Neighbors Decision Boundaries Nearest neighbor algorithm does not explicitly compute decision boundaries, but these can be inferred Decision boundaries: Voronoi diagram visualization show how input space divided into classes each line segment is equidistant between two points of opposite classes Urtasun & Zemel (UofT) CSC 411: 05-Nearest Neighbors Sep 28, 2015 9 / 13

k Nearest Neighbors Nearest neighbors sensitive to mis-labeled data ( class noise ) smooth by having k nearest neighbors vote Algorithm: 1. find k examples {x (i), t (i) } closest to the test instance x 2. classification output is majority class y = arg max t (z) k δ(t (z), t (r) ) r=1 Urtasun & Zemel (UofT) CSC 411: 05-Nearest Neighbors Sep 28, 2015 10 / 13

k Nearest Neighbors: Issues & Remedies Some attributes have larger ranges, so are treated as more important normalize scale Irrelevant, correlated attributes add noise to distance measure eliminate some attributes or vary and possibly adapt weight of attributes Non-metric attributes (symbols) Hamming distance Brute-force approach: calculate Euclidean distance to test point from each stored point, keep closest: O(dn 2 ). We need to reduce computational burden: 1. Use subset of dimensions 2. Use subset of examples Remove examples that lie within Voronoi region Form efficient search tree (kd-tree), use Hashing (LSH), etc Urtasun & Zemel (UofT) CSC 411: 05-Nearest Neighbors Sep 28, 2015 11 / 13

Decision Boundary K-NN Urtasun & Zemel (UofT) CSC 411: 05-Nearest Neighbors Sep 28, 2015 12 / 13

K-NN Summary Single parameter (k) how do we set it? Naturally forms complex decision boundaries; adapts to data density Problems: Sensitive to class noise. Sensitive to dimensional scales. Distances are less meaningful in high dimensions Scales with number of examples Inductive Bias: What kind of decision boundaries do we expect to find? Urtasun & Zemel (UofT) CSC 411: 05-Nearest Neighbors Sep 28, 2015 13 / 13