K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Similar documents
Unsupervised Data Mining: Clustering. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Density-Based Clustering. Izabela Moise, Evangelos Pournaras

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

9 Classification: KNN and SVM

Data Mining. Lecture 03: Nearest Neighbor Learning

Topic 1 Classification Alternatives

Data Mining and Machine Learning: Techniques and Algorithms

CISC 4631 Data Mining

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

CS 584 Data Mining. Classification 1

Data Preprocessing. Supervised Learning

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

Nearest Neighbor Classification. Machine Learning Fall 2017

Lecture 3. Oct

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

k-nearest Neighbor (knn) Sept Youn-Hee Han

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Intro to Artificial Intelligence

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Data Mining and Data Warehousing Classification-Lazy Learners

MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Supervised Learning: Nearest Neighbors

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Linear Regression and K-Nearest Neighbors 3/28/18

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Mining di Dati Web. Lezione 3 - Clustering and Classification

CS570: Introduction to Data Mining

Instance-based Learning

CSC411/2515 Tutorial: K-NN and Decision Tree

Introduction to Artificial Intelligence

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

Lecture 8: Grid Search and Model Validation Continued

Data Mining and Machine Learning. Instance-Based Learning. Rote Learning k Nearest-Neighbor Classification. IBL and Rule Learning

10/5/2017 MIST.6060 Business Intelligence and Data Mining 1. Nearest Neighbors. In a p-dimensional space, the Euclidean distance between two records,

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Naïve Bayes for text classification

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Supervised Learning: K-Nearest Neighbors and Decision Trees

Cluster Analysis. CSE634 Data Mining

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

Lecture #11: The Perceptron

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

K- Nearest Neighbors(KNN) And Predictive Accuracy

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Announcements:$Rough$Plan$$

Instance-Based Learning: A Survey

CPSC 340: Machine Learning and Data Mining

USING OF THE K NEAREST NEIGHBOURS ALGORITHM (k-nns) IN THE DATA CLASSIFICATION

k-nearest Neighbors + Model Selection

Lecture 7: Linear Regression (continued)

PROBLEM 4

Machine Learning in the Wild. Dealing with Messy Data. Rajmonda S. Caceres. SDS 293 Smith College October 30, 2017

Chapter 2: Classification & Prediction

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Perceptron as a graph

Distribution-free Predictive Approaches

Decision Trees Oct

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Machine Learning and Pervasive Computing

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

CS6716 Pattern Recognition

CSC 411: Lecture 05: Nearest Neighbors

ECE 5424: Introduction to Machine Learning

CS489/698 Lecture 2: January 8 th, 2018

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions

Big Data Analytics. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Instance-Based Learning.

Classification and K-Nearest Neighbors

Machine Learning nearest neighbors classification. Luigi Cerulo Department of Science and Technology University of Sannio

Gene Clustering & Classification

Nearest Neighbor Methods

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Classification Key Concepts

Nearest neighbor classification DSE 220

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

Network Traffic Measurements and Analysis

Decision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang

CS 340 Lec. 4: K-Nearest Neighbors

VECTOR SPACE CLASSIFICATION

Machine Learning using MapReduce

Basic Data Mining Technique

Comparative analysis of classifier algorithm in data mining Aikjot Kaur Narula#, Dr.Raman Maini*

5/21/17. Standing queries. Spam filtering Another text classification task. Categorization/Classification. Document Classification

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Machine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Classification Key Concepts

ADVANCED CLASSIFICATION TECHNIQUES

Recommendation Algorithms: Collaborative Filtering. CSE 6111 Presentation Advanced Algorithms Fall Presented by: Farzana Yasmeen

Machine Learning: Algorithms and Applications Mockup Examination

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

k Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)

Transcription:

K-Nearest Neighbour Classifier Izabela Moise, Evangelos Pournaras, Dirk Helbing Izabela Moise, Evangelos Pournaras, Dirk Helbing 1

Reminder Supervised data mining Classification Decision Trees Izabela Moise, Evangelos Pournaras, Dirk Helbing 2

K-Nearest Neighbour (knn) Classifier Izabela Moise, Evangelos Pournaras, Dirk Helbing 3

Classification steps 1. Training phase: a model is constructed from the training instances. classification algorithm finds relationships between predictors and targets relationships are summarised in a model 2. Testing phase: test the model on a test sample whose class labels are known but not used for training the model 3. Usage phase: use the model for classification on new data whose class labels are unknown Izabela Moise, Evangelos Pournaras, Dirk Helbing 4

Instance-based Classification Main idea: Similar instances have similar classification no clear separation between the three phases of classification also called lazy classification, as opposed to eager classification Izabela Moise, Evangelos Pournaras, Dirk Helbing 5

Eager vs Lazy Classification Eager Model is computed before classification Model is independent of the test instance Test instance is not included in the training data Avoids too much work at classification time Model is not accurate for each instance Lazy Model is computed during classification Model is dependent on the test instance Test instance is included in the training data High accuracy for models at each instance level Izabela Moise, Evangelos Pournaras, Dirk Helbing 6

k-nearest Neighbor (knn) Learning by analogy: Tell me who your friends are and I ll tell you who you are an instance is assigned to the most common class among the instances similar to it 1. how to measure similarity between instances 2. how to choose the most common class Izabela Moise, Evangelos Pournaras, Dirk Helbing 7

How does it work? Izabela Moise, Evangelos Pournaras, Dirk Helbing 8

How does it work? Izabela Moise, Evangelos Pournaras, Dirk Helbing 8

Comparing Objects Problem : measure similarity between instances different types of data: numbers colours, geolocation, booleans etc. Solution : convert all features of the instances into numerical values represent instances as vectors of features in an n-dimensional space Izabela Moise, Evangelos Pournaras, Dirk Helbing 9

Comparing Objects Problem : measure similarity between instances vs. text similarity different types of data: numbers colours, geolocation, booleans etc. Solution : convert all features of the instances into numerical values represent instances as vectors of features in an n-dimensional space Izabela Moise, Evangelos Pournaras, Dirk Helbing 9

Comparing Objects Problem : measure similarity between instances vs. text similarity different types of data: numbers colours, geolocation, booleans etc. Solution : convert all features of the instances into numerical values represent instances as vectors of features in an n-dimensional space Izabela Moise, Evangelos Pournaras, Dirk Helbing 9

Comparing Objects Problem : measure similarity between instances vs. text similarity different types of data: numbers colours, geolocation, booleans etc. Solution : convert all features of the instances into numerical values represent instances as vectors of features in an n-dimensional space Izabela Moise, Evangelos Pournaras, Dirk Helbing 9

Comparing Objects Problem : measure similarity between instances vs. text similarity different types of data: numbers colours, geolocation, booleans etc. Solution : convert all features of the instances into numerical values represent instances as vectors of features in an n-dimensional space Izabela Moise, Evangelos Pournaras, Dirk Helbing 9

An Example: Izabela Moise, Evangelos Pournaras, Dirk Helbing 10

Distance Metrics 1. Euclidean Distance 2. Manhattan Distance 3. Minkowski Distance Izabela Moise, Evangelos Pournaras, Dirk Helbing 11

Choosing k Classification is sensitive to the correct selection of k if k is too small overfitting small k? larger k? algorithm performs too good on the training set, compared to its true performance on unseen test data Izabela Moise, Evangelos Pournaras, Dirk Helbing 12

Choosing k Classification is sensitive to the correct selection of k if k is too small overfitting algorithm performs too good on the training set, compared to its true performance on unseen test data small k? less stable, influenced by noise larger k? less precise, higher bias Izabela Moise, Evangelos Pournaras, Dirk Helbing 12

Choosing k Classification is sensitive to the correct selection of k if k is too small overfitting algorithm performs too good on the training set, compared to its true performance on unseen test data small k? less stable, influenced by noise larger k? less precise, higher bias k= 2 n Izabela Moise, Evangelos Pournaras, Dirk Helbing 12

Pros and Cons Pros: simple to implement and use robust to noisy data by averaging k-nearest neighbours knn classification is based solely on local information the decision boundaries can be of arbitrary shapes Izabela Moise, Evangelos Pournaras, Dirk Helbing 13

Pros and Cons Cons: X curse of dimensionality: distance can be dominated by irrelevant attributes X O(n) for each instance to be classified X more expensive to classify a new instance than with a model Izabela Moise, Evangelos Pournaras, Dirk Helbing 14