CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

Similar documents
Introduction to Clustering

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Nearest Neighbor Classification. Machine Learning Fall 2017

Proximity and Data Pre-processing

Lecture 3. Oct

CSC 411: Lecture 05: Nearest Neighbors

Topic 1 Classification Alternatives

CS 340 Lec. 4: K-Nearest Neighbors

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

Distribution-free Predictive Approaches

CS7267 MACHINE LEARNING

Data Mining and Data Warehousing Classification-Lazy Learners

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Machine Learning nearest neighbors classification. Luigi Cerulo Department of Science and Technology University of Sannio

Supervised Learning: K-Nearest Neighbors and Decision Trees

Data Preprocessing. Supervised Learning

CS570: Introduction to Data Mining

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Nearest Neighbor Methods

Data Mining: Data. What is Data? Lecture Notes for Chapter 2. Introduction to Data Mining. Properties of Attribute Values. Types of Attributes

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

Machine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Data Mining and Machine Learning: Techniques and Algorithms

k-nearest Neighbor (knn) Sept Youn-Hee Han

Overview. Non-Parametrics Models Definitions KNN. Ensemble Methods Definitions, Examples Random Forests. Clustering. k-means Clustering 2 / 8

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

Decision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang

Data Mining. Lecture 03: Nearest Neighbor Learning

Nearest Neighbor Predictors

CISC 4631 Data Mining

MS1b Statistical Data Mining Part 3: Supervised Learning Nonparametric Methods

Recommendation Algorithms: Collaborative Filtering. CSE 6111 Presentation Advanced Algorithms Fall Presented by: Farzana Yasmeen

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

9 Classification: KNN and SVM

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler, Sanjay Ranka

Linear Regression and K-Nearest Neighbors 3/28/18

Instance-Based Learning.

Instance-Based Learning. Goals for the lecture

What is Cluster Analysis?

k-nearest Neighbors + Model Selection

Overview of Clustering

Chapter 2 Basic Structure of High-Dimensional Spaces

VECTOR SPACE CLASSIFICATION

MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

CSE 573: Artificial Intelligence Autumn 2010

Machine Perception of Music & Audio. Topic 10: Classification

Supervised Learning: Nearest Neighbors

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Based on Raymond J. Mooney s slides

CSE 5243 INTRO. TO DATA MINING

Generative and discriminative classification techniques

Chapter 12 Feature Selection

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Unsupervised Learning. Presenter: Anil Sharma, PhD Scholar, IIIT-Delhi

Mining di Dati Web. Lezione 3 - Clustering and Classification

Classification: Feature Vectors

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Getting to Know Your Data

CSE 5243 INTRO. TO DATA MINING

K-Nearest Neighbors. Jia-Bin Huang. Virginia Tech Spring 2019 ECE-5424G / CS-5824

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Supervised vs unsupervised clustering

Data Mining and Machine Learning. Instance-Based Learning. Rote Learning k Nearest-Neighbor Classification. IBL and Rule Learning

Chemometrics. Description of Pirouette Algorithms. Technical Note. Abstract

K- Nearest Neighbors(KNN) And Predictive Accuracy

Gene Clustering & Classification

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Applying Supervised Learning

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

Bioinformatics - Lecture 07

Clustering, cont. Genome 373 Genomic Informatics Elhanan Borenstein. Some slides adapted from Jacques van Helden

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

Statistics 202: Statistical Aspects of Data Mining

CS 8520: Artificial Intelligence. Machine Learning 2. Paula Matuszek Fall, CSC 8520 Fall Paula Matuszek

CSCI567 Machine Learning (Fall 2014)

Text Categorization (I)

Recognition Tools: Support Vector Machines

RENR690-SDM. Zihaohan Sang. March 26, 2018

Machine Learning: Think Big and Parallel

Introduction to Pattern Recognition Part II. Selim Aksoy Bilkent University Department of Computer Engineering

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Machine Learning Classifiers and Boosting

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Nearest Neighbor Classification

Transcription:

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM Mingon Kang, PhD Computer Science, Kennesaw State University

KNN K-Nearest Neighbors (KNN) Simple, but very powerful classification algorithm Classifies based on a similarity measure Non-parametric Lazy learning Does not learn until the test example is given Whenever we have a new data to classify, we fine its K-nearest neighbors from the training data Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors

KNN: Classification Approach Classified by MAJORITY VOTES for its neighbor classes Assigned to the most common class amongst its K nearest neighbors (by measuring distant between data) Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors

KNN: Example Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors

KNN: Pseudocode Ref: https://www.slideshare.net/phuongnguyen6/text-categorization

KNN: Example Ref: http://www.scholarpedia.org/article/k-nearest_neighbor

KNN: Euclidean distance matrix Ref: http://www.scholarpedia.org/article/k-nearest_neighbor

Decision Boundaries Voronoi diagram Describes the areas that are nearest to any given point, given a set of data. Each line segment is equidistant between two points of opposite class Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors

Decision Boundaries With large number of examples and possible noise in the labels, the decision boundary can become nasty! Overfitting problem Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors

Effect of K Larger k produces smoother boundary effect When K==N, always predict the majority class Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors

How to choose k? Empirically optimal k? Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors

Pros and Cons Pros Learning and implementation is extremely simple and Intuitive Flexible decision boundaries Cons Irrelevant or correlated features have high impact and must be eliminated Typically difficult to handle high dimensionality Computational costs: memory and classification time computation Ref: https://www.slideshare.net/tilanigunawardena/k-nearest-neighbors

Similarity and Dissimilarity Similarity Numerical measure of how alike two data objects are. Is higher when objects are more alike. Often falls in the range [0,1] Dissimilarity Numerical measure of how different are two data objects Lower when objects are more alike Minimum dissimilarity is often 0 Upper limit varies Proximity refers to a similarity or dissimilarity

Euclidean Distance Euclidean Distance dist = p σ k=1 (a k b k ) 2 Where p is the number of dimensions (attributes) and a k and b k are, respectively, the k-th attributes (components) or data objects a and b. Standardization is necessary, if scales differ.

Euclidean Distance

Minkowski Distance Minkowski Distance is a generalization of Euclidean Distance p dist = k=1 a k b k r 1/r Where r is a parameter, p is the number of dimensions (attributes) and a k and b k are, respectively, the k-th attributes (components) or data objects a and b

Minkowski Distance: Examples r = 1. City block (Manhattan, taxicab, L1 norm) distance. A common example of this is the Hamming distance, which is just the number of bits that are different between two binary vectors r = 2. Euclidean distance r. supremum (L max norm, L norm) distance. This is the maximum difference between any component of the vectors Do not confuse r with p, i.e., all these distances are defined for all numbers of dimensions.

Cosine Similarity If d 1 and d 2 are two document vectors cos(d 1, d 2 )=(d 1 d 2 )/ d 1 d 2, Where indicates vector dot product and d is the length of vector d. Example: d 1 = 3 2 0 5 0 0 0 2 0 0 d 2 = 1 0 0 0 0 0 0 1 0 2 d 1 d 2 = 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5 d 1 = (3*3+2*2+0*0+5*5+0*0+0*0+0*0+2*2+0*0+0*0)0.5= (42)^0.5= 6.481 d 1 = (1*1+0*0+0*0+0*0+0*0+0*0+0*0+1*1+0*0+2*2)0.5= (6)^0.5= 2.245 cos( d1, d2) =.3150

Cosine Similarity 1: exactly the same cos(d 1, d 2 ) = 0: orthogonal 1: exactly opposite

Feature scaling Standardize the range of independent variables (features of data) A.k.a Normalization or Standardization

Standardization Standardization or Z-score normalization Rescale the data so that the mean is zero and the standard deviation from the mean (standard scores) is one x norm = x μ σ μ is mean, σ is a standard deviation from the mean (standard score)

Min-Max scaling Scale the data to a fixed range between 0 and 1 x morm = x x min x max x min