Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

Similar documents
9 Classification: KNN and SVM

Data Preprocessing. Supervised Learning

Data Mining. Lecture 03: Nearest Neighbor Learning

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

CS 584 Data Mining. Classification 1

CISC 4631 Data Mining

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

K- Nearest Neighbors(KNN) And Predictive Accuracy

Data Mining and Data Warehousing Classification-Lazy Learners

K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy

CS570: Introduction to Data Mining

CSC 411: Lecture 05: Nearest Neighbors

Topic 1 Classification Alternatives

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

Data Mining and Machine Learning: Techniques and Algorithms

Lecture 3. Oct

Nearest Neighbor Classification. Machine Learning Fall 2017

Announcements:$Rough$Plan$$

Data Mining and Machine Learning. Instance-Based Learning. Rote Learning k Nearest-Neighbor Classification. IBL and Rule Learning

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

Classification and Regression

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Voronoi Diagrams in the Plane. Chapter 5 of O Rourke text Chapter 7 and 9 of course text

6. Concluding Remarks

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

VECTOR SPACE CLASSIFICATION

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

Distribution-free Predictive Approaches

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Lecture Notes for Chapter 5

Nearest Neighbor Classifiers

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar

k-nearest Neighbor (knn) Sept Youn-Hee Han

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Naïve Bayes for text classification

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour

Instance and case-based reasoning

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

CS6716 Pattern Recognition

Machine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science

CS 340 Lec. 4: K-Nearest Neighbors

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Nearest Neighbor Methods

Geometric Decision Rules for Instance-based Learning Problems

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

Analysis and Comparison of Efficient Techniques of Clustering Algorithms in Data Mining

Lecture Notes for Chapter 5

Statistics 202: Statistical Aspects of Data Mining

Instance-based Learning

10/5/2017 MIST.6060 Business Intelligence and Data Mining 1. Nearest Neighbors. In a p-dimensional space, the Euclidean distance between two records,

Lecture 6 Classification and Prediction

Fast Computation of Generalized Voronoi Diagrams Using Graphics Hardware

Supervised Learning: Nearest Neighbors

Approximate Nearest Neighbor Problem: Improving Query Time CS468, 10/9/2006

CPSC 695. Methods for interpolation and analysis of continuing surfaces in GIS Dr. M. Gavrilova

Geometric Computation: Introduction

CS5670: Computer Vision

Linear Regression and K-Nearest Neighbors 3/28/18

Intro to Artificial Intelligence

USING OF THE K NEAREST NEIGHBOURS ALGORITHM (k-nns) IN THE DATA CLASSIFICATION

Instance-Based Learning.

Going nonparametric: Nearest neighbor methods for regression and classification

Geometric data structures:

Decision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang

Computational Geometry. Algorithm Design (10) Computational Geometry. Convex Hull. Areas in Computational Geometry

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Dijkstra's Algorithm

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions

Jarek Szlichta

ADVANCED CLASSIFICATION TECHNIQUES

Lecture Notes for Chapter 7. Introduction to Data Mining, 2 nd Edition. by Tan, Steinbach, Karpatne, Kumar

Lecture 7: Decision Trees

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

CS570: Introduction to Data Mining

Points and Pointillism

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

Based on Raymond J. Mooney s slides

Chapter 7 TOPOLOGY CONTROL

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM

Object Recognition. Lecture 11, April 21 st, Lexing Xie. EE4830 Digital Image Processing

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

The k-means Algorithm and Genetic Algorithm

Data Mining Cluster Analysis: Advanced Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining, 2 nd Edition

Supervised Learning Classification Algorithms Comparison

Supervised Learning: K-Nearest Neighbors and Decision Trees

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Module 4: Index Structures Lecture 16: Voronoi Diagrams and Tries. The Lecture Contains: Voronoi diagrams. Tries. Index structures

PROBLEM 4

CPSC 695. Geometric Algorithms in Biometrics. Dr. Marina L. Gavrilova

A Comparison of Nearest Neighbor Search Algorithms for Generic Object Recognition

Instance-Based Learning: Nearest neighbor and kernel regression and classificiation

Transcription:

Data Mining Classification: Alternative Techniques Lecture Notes for Chapter 4 Instance-Based Learning Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Instance Based Classifiers Examples: Rote-learner u Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly Nearest neighbor u Uses k closest points (nearest neighbors) for performing classification 02/14/2018 Introduction to Data Mining, 2 nd Edition 2

Nearest Neighbor Classifiers Basic idea: If it walks like a duck, quacks like a duck, then it s probably a duck Compute Distance Test Record Training Records Choose k of the nearest records 02/14/2018 Introduction to Data Mining, 2 nd Edition 3 Nearest-Neighbor Classifiers Unknown record Requires three things The set of labeled records Distance Metric to compute distance between records The value of k, the number of nearest neighbors to retrieve To classify an unknown record: Compute distance to other training records Identify k nearest neighbors Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote) 02/14/2018 Introduction to Data Mining, 2 nd Edition 4

Definition of Nearest Neighbor X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor K-nearest neighbors of a record x are data points that have the k smallest distances to x 02/14/2018 Introduction to Data Mining, 2 nd Edition 5 1 nearest-neighbor Voronoi Diagram 02/14/2018 Introduction to Data Mining, 2 nd Edition 6

Nearest Neighbor Classification Compute distance between two points: Euclidean distance d( p, q) = i ( p i q i 2 ) Determine the class from nearest neighbor list Take the majority vote of class labels among the k-nearest neighbors Weigh the vote according to distance u weight factor, w = 1/d 2 02/14/2018 Introduction to Data Mining, 2 nd Edition 7 Nearest Neighbor Classification Choosing the value of k: If k is too small, sensitive to noise points If k is too large, neighborhood may include points from other classes X 02/14/2018 Introduction to Data Mining, 2 nd Edition 8

Nearest Neighbor Classification Scaling issues Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes Example: u height of a person may vary from 1.5m to 1.8m u weight of a person may vary from 90lb to 300lb u income of a person may vary from $10K to $1M 02/14/2018 Introduction to Data Mining, 2 nd Edition 9 Nearest Neighbor Classification Selection of the right similarity measure is critical: 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 vs 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 Euclidean distance = 1.4142 for both pairs 02/14/2018 Introduction to Data Mining, 2 nd Edition 10

Nearest neighbor Classification k-nn classifiers are lazy learners since they do not build models explicitly Classifying unknown records are relatively expensive Can produce arbitrarily shaped decision boundaries Easy to handle variable interactions since the decisions are based on local information Selection of right proximity measure is essential Superfluous or redundant attributes can create problems Missing attributes are hard to handle 02/14/2018 Introduction to Data Mining, 2 nd Edition 11 Improving KNN Efficiency Avoid having to compute distance to all objects in the training set Multi-dimensional access methods (k-d trees) Fast approximate similarity search Locality Sensitive Hashing (LSH) Condensing Determine a smaller set of objects that give the same performance Editing Remove objects to improve efficiency 02/14/2018 Introduction to Data Mining, 2 nd Edition 12

KNN and Proximity Graphs Proximity graphs a graph in which two vertices are connected by an edge if and only if the vertices satisfy particular geometric requirements nearest neighbor graphs, minimum spanning trees Delaunay triangulations relative neighborhood graphs Gabriel graphs See recent papers by Toussaint G. T. Toussaint. Proximity graphs for nearest neighbor decision rules: recent progress. In Interface-2002, 34th Symposium on Computing and Statistics, ontreal, Canada, April 17 20 2002. G. T. Toussaint. Open problems in geometric methods for instance based learning. In Discrete and Computational Geometry, volume 2866 of Lecture Notes in Computer Science, pages 273 283, December 6-9, 2003. G. T. Toussaint. Geometric proximity graphs for improving nearest neighbor methods in instance-based learning and data mining. Int. J. Comput. Geometry Appl., 15(2):101 150, 2005. 02/14/2018 Introduction to Data Mining, 2 nd Edition 13