Instance-Based Learning.

Similar documents
Instance-Based Learning. Goals for the lecture

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

Decision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang

CISC 4631 Data Mining

Data Mining. Lecture 03: Nearest Neighbor Learning

Supervised Learning: Nearest Neighbors

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

CSC 411: Lecture 05: Nearest Neighbors

Supervised Learning: K-Nearest Neighbors and Decision Trees

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

Nearest Neighbor Classification. Machine Learning Fall 2017

Slides for Data Mining by I. H. Witten and E. Frank

1) Give decision trees to represent the following Boolean functions:

Machine Learning Techniques for Data Mining

Lecture 3. Oct

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Instance and case-based reasoning

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Instance-based Learning

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

Based on Raymond J. Mooney s slides

Data Mining and Machine Learning: Techniques and Algorithms

Machine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Classification and K-Nearest Neighbors

kd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

Data Preprocessing. Supervised Learning

Hierarchical Clustering 4/5/17

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Linear Regression and K-Nearest Neighbors 3/28/18

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Instance-based Learning

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Data Mining Algorithms

CS 584 Data Mining. Classification 1

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Classification and Regression

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Spatial Queries. Nearest Neighbor Queries

Machine Learning nearest neighbors classification. Luigi Cerulo Department of Science and Technology University of Sannio

26 The closest pair problem

CS S Lecture February 13, 2017

K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Perceptron as a graph

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Problem 1: Complexity of Update Rules for Logistic Regression

1 Static-to-Dynamic Transformations

CS489/698 Lecture 2: January 8 th, 2018

Large-scale visual recognition Efficient matching

Data Mining and Data Warehousing Classification-Lazy Learners

Data Mining and Machine Learning. Instance-Based Learning. Rote Learning k Nearest-Neighbor Classification. IBL and Rule Learning

KD-Tree Algorithm for Propensity Score Matching PhD Qualifying Exam Defense

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

Nearest neighbors classifiers

Nearest Neighbor with KD Trees

Instance-based Learning

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Recitation 10. Shortest Paths Announcements. ShortLab has been released, and is due Friday afternoon. It s worth 125 points.

Search. The Nearest Neighbor Problem

Basic Data Mining Technique

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Computational Geometry

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

Going nonparametric: Nearest neighbor methods for regression and classification

CS4445 Data Mining and Knowledge Discovery in Databases. A Term 2008 Exam 2 October 14, 2008

Review implementation of Stable Matching Survey of common running times. Turn in completed problem sets. Jan 18, 2019 Sprenkle - CSCI211

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Supervised vs unsupervised clustering

Nearest Neighbor Predictors

Machine Learning and Pervasive Computing

Data Mining Lecture 8: Decision Trees

VECTOR SPACE CLASSIFICATION

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

A Lazy Approach for Machine Learning Algorithms

Beyond Sliding Windows: Object Localization by Efficient Subwindow Search

CS570: Introduction to Data Mining

Model Assessment and Selection. Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer

Informed (Heuristic) Search. Idea: be smart about what paths to try.

Unsupervised Learning

Machine Learning. Supervised Learning. Manfred Huber

Nearest Neighbor Search by Branch and Bound

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions

Kernels and Clustering

Approximation Algorithms for Clustering Uncertain Data

CS 340 Lec. 4: K-Nearest Neighbors

Chapter 12 Feature Selection

Nearest Neighbor with KD Trees

Linear Methods for Regression and Shrinkage Methods

Classifiers and Detection. D.A. Forsyth

k-nearest Neighbors + Model Selection

Lecture-17: Clustering with K-Means (Contd: DT + Random Forest)

{ K 0 (P ); ; K k 1 (P )): k keys of P ( COORD(P) ) { DISC(P): discriminator of P { HISON(P), LOSON(P) Kd-Tree Multidimensional binary search tree Tak

Transcription:

Instance-Based Learning www.biostat.wisc.edu/~dpage/cs760/

Goals for the lecture you should understand the following concepts k-nn classification k-nn regression edited nearest neighbor k-d trees for nearest neighbor identification

Nearest-neighbor classification learning task given a training set ( x 1, y 1 ) ( x n, y n ), do nothing (it s sometimes called a lazy learner) classification task given: an instance x q to classify find the training-set instance x i that is most similar to x q return the class value y i

The decision regions for nearestneighbor classification Voronoi Diagram: Each polyhedron indicates the region of feature space that is in the nearest neighborhood of each training instance x 2 x 1

k-nearest-neighbor classification classification task given: an instance x q to classify find the k training-set instances most similar to x q return the class value ˆ y argmax v values(y ) k δ(v, y i ) i=1 ( x 1,y 1 ) ( x k,y k ) that are # δ(a,b) = $ 1 if a = b % 0 otherwise (i.e. return the class that the plurality of the neighbors have)

How can we determine similarity/distance suppose all features are nominal (discrete) Hamming distance: count the number of features for which two instances differ suppose all features are continuous Euclidean distance: d(x i,x j ) = f ( x if x jf ) 2 where x if represents the f th feature of x i Could also use Manhattan distance: sum the differences in feature values continuous analog to Hamming distance

How can we determine similarity/distance if we have a mix of discrete/continuous features: d(x i,x j ) = f $ & x if x jf if f is continuous % '& 1 δ( x if, x ) jf if f is discrete If all feature are of equal importance, want to apply to continuous features some type of normalization (values range 0 to 1) or standardization (values distributed according to standard normal)

k-nearest-neighbor regression learning task given a training set ( x 1, y 1 ) ( x n, y n ), do nothing prediction task given: an instance x q to make a prediction for find the k training-set instances ( x 1,y 1 ) x k,y k most similar to x q return the value ( ) that are ˆ y 1 k k i=1 y i

Distance-weighted nearest neighbor We can have instances contribute to a prediction according to their distance from x q classification: ˆ y argmax v values(y ) k w i δ(v, y i ) w i = i=1 1 d(x q,x i ) 2 regression: ˆ y k i=1 k w i i=1 w i y i

Speeding up k-nn k-nn is a lazy learning algorithm does virtually nothing at training time but classification/prediction time can be costly when the training set is large two general strategies for alleviating this weakness don t retain every training instance (edited nearest neighbor) use a smart data structure to look up nearest neighbors (e.g. a k-d tree)

Edited instance-based learning select a subset of the instances that still provide accurate classifications incremental deletion start with all training instances in memory for each training instance (x i, y i ) if other training instances provide correct classification for (x i, y i ) delete it from the memory incremental growth start with an empty memory for each training instance (x i, y i ) if other training instances in memory don t correctly classify (x i, y i ) add it to the memory

k-d trees a k-d tree is similar to a decision tree except that each internal node stores one instance splits on the median value of the feature having the highest variance x > 6 f y > 10 c y > 5 h y > 4 e x > 3 b x > 9 g x > 10 i y > 8 d y > 11 a y > 11.5 j

Finding nearest neighbors with a k-d tree use branch-and-bound search priority queue stores nodes considered lower bound on their distance to query instance lower bound given by distance using a single feature average case: O(log 2 n) worst case: O(n)

Finding nearest neighbors in a k-d tree NearestNeighbor(instance q) PQ = { } // minimizing priority queue best_dist = // smallest distance seen so far PQ.push(root, 0) while PQ is not empty (node, bound) = PQ.pop(); if (bound best_dist) // bound is best bound on all PQ return best_node.instance // so nearest neighbor found dist = distance(q, node. instance) if (dist < best_dist) best_dist = dist best_node = node if (q[node.feature] node.threshold > 0) PQ.push(node.left, q[node.feature] node.threshold) PQ.push(node.right, 0) else PQ.push(node.left, 0) PQ.push(node.right, node. threshold - q [node.feature]) return best_node. instance

k-d tree example (Manhattan Distance) given query q = (2, 3) x > 6 f y > 10 c y > 5 h y > 4 e x > 3 b x > 9 g x > 10 i q y > 8 d y > 11 a y > 11.5 j distance best distance best node priority queue (f, 0) 4.0 4.0 f (c, 0) (h, 4) 10.0 4.0 f (e, 0) (h, 4) (b, 7) 1.0 1.0 e (d, 1) (h, 4) (b, 7)

Strengths of instance-based learning simple to implement training is very efficient adapts well to on-line learning robust to noisy training data (when k > 1) often works well in practice

Limitations of instance-based learning sensitive to range of feature values sensitive to irrelevant and correlated features, although there are variants that learn weights for different features later we ll talk about feature selection methods classification can be inefficient, although edited methods and k-d trees can help alleviate this weakness doesn t provide much insight into problem domain because there is no explicit model