Data Mining and Machine Learning: Techniques and Algorithms

Similar documents
Data Mining and Machine Learning. Instance-Based Learning. Rote Learning k Nearest-Neighbor Classification. IBL and Rule Learning

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Machine Learning and Pervasive Computing

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Instance-based Learning

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten

9 Classification: KNN and SVM

Data Preprocessing. Supervised Learning

Nearest Neighbor Methods

CS 584 Data Mining. Classification 1

CISC 4631 Data Mining

Data Mining. Lecture 03: Nearest Neighbor Learning

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Geometric data structures:

Classification with Decision Tree Induction

Data Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4.

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

Distribution-free Predictive Approaches

Data Mining and Data Warehousing Classification-Lazy Learners

Data Mining and Analytics

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Chapter 4: Algorithms CS 795

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Chapter 4: Algorithms CS 795

Supervised Learning: Nearest Neighbors

9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form)

Clustering Billions of Images with Large Scale Nearest Neighbor Search

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

Based on Raymond J. Mooney s slides

CSC411/2515 Tutorial: K-NN and Decision Tree

Data Mining Part 4. Tony C Smith WEKA Machine Learning Group Department of Computer Science University of Waikato

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Search. The Nearest Neighbor Problem

Nearest Neighbor Classification. Machine Learning Fall 2017

Introduction to Machine Learning

Practical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

Part 12: Advanced Topics in Collaborative Filtering. Francesco Ricci

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions

Instance-based Learning

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

Problem 1: Complexity of Update Rules for Logistic Regression

1) Give decision trees to represent the following Boolean functions:

k-nearest Neighbor (knn) Sept Youn-Hee Han

CSC 411: Lecture 05: Nearest Neighbors

Clustering in Data Mining

Decision Trees: Discussion

Nearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017

Prediction. What is Prediction. Simple methods for Prediction. Classification by decision tree induction. Classification and regression evaluation

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

Lecture 6: Unsupervised Machine Learning Dagmar Gromann International Center For Computational Logic

Midterm Examination CS540-2: Introduction to Artificial Intelligence

University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 11: Non-Parametric Techniques

Chapter 4: Non-Parametric Techniques

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

Lecture 5: Decision Trees (Part II)

Unsupervised Learning

Data Structures III: K-D

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM

Nearest neighbor classification DSE 220

BITS F464: MACHINE LEARNING

Machine Learning: k-nearest Neighbors. Lecture 08. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Spatial Data Structures for Computer Graphics

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

CSIS. Pattern Recognition. Prof. Sung-Hyuk Cha Fall of School of Computer Science & Information Systems. Artificial Intelligence CSIS

CS 340 Lec. 4: K-Nearest Neighbors

Data Mining Classification - Part 1 -

kd-trees Idea: Each level of the tree compares against 1 dimension. Let s us have only two children at each node (instead of 2 d )

Instance-Based Learning.

Statistical Learning Part 2 Nonparametric Learning: The Main Ideas. R. Moeller Hamburg University of Technology

7. Decision or classification trees

MODULE 7 Nearest Neighbour Classifier and its variants LESSON 11. Nearest Neighbour Classifier. Keywords: K Neighbours, Weighted, Nearest Neighbour

Data Mining Algorithms: Basic Methods

Big Data Analytics. Special Topics for Computer Science CSE CSE April 14

Outline. RainForest A Framework for Fast Decision Tree Construction of Large Datasets. Introduction. Introduction. Introduction (cont d)

Notes and Announcements

Nearest Neighbor Predictors

Locality- Sensitive Hashing Random Projections for NN Search

Homework Assignment #3

Parametrizing the easiness of machine learning problems. Sanjoy Dasgupta, UC San Diego

Perceptron as a graph

CMSC 754 Computational Geometry 1

Linear Regression and K-Nearest Neighbors 3/28/18

Naïve Bayes for text classification

Basic Data Mining Technique

Chapter 5 Efficient Memory Information Retrieval

Transcription:

Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019, 21.1. 24.1. University of Economics, Prague IW19 Data Mining and Machine Learning: Techniques and Algorithms 1

Outline Rote Learning k-nearest Neighbor Classification Choosing k Distance functions Instance and Feature Weighting Efficiency kd-trees Ball trees IW19 Data Mining and Machine Learning: Techniques and Algorithms 2

Rote Learning Day Temperature Outlook Humidity Windy Play Golf? 07-05 hot sunny high false no 07-06 hot sunny high true no 07-07 hot overcast high false yes 07-09 cool rain normal false yes 07-10 cool overcast normal true yes 07-12 mild sunny high false no 07-14 cool sunny normal false yes 07-15 mild rain normal false yes 07-20 mild sunny normal true yes 07-21 mild overcast high true yes 07-22 hot overcast normal false yes 07-23 mild rain high true no 07-26 cool rain normal true no 07-30 mild rain high false yes today cool sunny normal false yes IW19 Data Mining and Machine Learning: Techniques and Algorithms 3

Instance Based Classifiers No model is learned The stored training instances themselves represent the knowledge Training instances are searched for instance that most closely resembles new instance lazy learning Examples: Rote-learner Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly Nearest-neighbor classifier Uses k closest points (nearest neigbors) for performing classification IW19 Data Mining and Machine Learning: Techniques and Algorithms 4

Nearest Neighbor Classification Day Temperature Outlook Humidity Windy Play Golf? 07-05 hot sunny high false no 07-06 hot sunny high true no 07-07 hot overcast high false yes 07-09 cool rain normal false yes 07-10 cool overcast normal true yes 07-12 mild sunny high false no 07-14 cool sunny normal false yes 07-15 mild rain normal false yes 07-20 mild sunny normal true yes 07-21 mild overcast high true yes 07-22 hot overcast normal false yes 07-23 mild rain high true no 07-26 cool rain normal true no 12-30 mild rain high false yes tomorrow mild sunny normal false yes IW19 Data Mining and Machine Learning: Techniques and Algorithms 5

Nearest Neighbor Classifier K-Nearest Neighbor algorithms classify a new example by comparing it to all previously seen examples. The classifications of the k most similar previous cases are used for predicting the classification of the current example. Training The training examples are used for providing a library of sample cases re-scaling the similarity function to maximize performance New Example? Classification IW19 Data Mining and Machine Learning: Techniques and Algorithms 6

Nearest Neighbors X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor k nearest neighbors of an example x are the data points that have the k smallest distances to x IW19 Data Mining and Machine Learning: Techniques and Algorithms 7

Prediction The predicted class is determined from the nearest neighbor list classification take the majority vote of class labels among the k-nearest neighbors y=max c i=1 can be easily be extended to regression k { 1 if y i =c 0 if y i c =max k c i=1 predict the average value of the class value of the k-nearest neighbors k y= 1 k i=1 y i 1 y i =c indicator function IW19 Data Mining and Machine Learning: Techniques and Algorithms 8

Weighted Prediction Often prediction can be improved if the influence of each neighbor is weighted k w i=1 i y i y= Weights typically depend on distance, e.g. w i = k i=1 w i 1 d x i, x 2 Note: with weighted distances, we could use all examples for classifications ( Inverse Distance Weighting) IW19 Data Mining and Machine Learning: Techniques and Algorithms 9

Nearest-Neighbor Classifiers unknown Unknown example record Require three things The set of stored examples Distance Metric to compute distance between examples The value of k, the number of nearest neighbors to retrieve To classify an unknown example: Compute distance to other training examples Identify k nearest neighbors Use class labels of nearest neighbors to determine the class label of unknown example (e.g., by taking majority vote) IW19 Data Mining and Machine Learning: Techniques and Algorithms 10

Lazy Learning Algorithms knn is considered a lazy learning algorithm Defers data processing until it receives a request to classify an unlabelled example Replies to a request for information by combining its stored training data Discards the constructed answer and any intermediate results Other names for lazy algorithms Memory-based, Instance-based, Exemplar-based, Case-based, Experience based This strategy is opposed to eager learning algorithms which Compiles its data into a compressed description or model Discards the training data after compilation of the model Classifies incoming patterns using the induced model IW19 Data Mining and Machine Learning: Techniques and Algorithms 12

Choosing the value of k if k is too small sensitive to noise in the data (misclassified examples) greater k leads to smoother boundaries IW19 Data Mining and Machine Learning: Techniques and Algorithms 13 Steven Skiena

Choosing the value of k If k is too large neighborhood may include points from other classes limiting case: all examples are considered largest class is predicted good values can be found e.g, by evaluating various values with cross-validation on the training data IW19 Data Mining and Machine Learning: Techniques and Algorithms 14

Distance Functions Computes the distance between two examples so that we can find the nearest neighbor to a given example General Idea: reduce the distance d (x 1, x 2 ) of two examples to the distances d A (v 1, v 2 ) between two values for attribute A Popular choices Euclidean Distance (L2): straight-line between two points d x 1, x 2 = A d A v 1, A, v 2, A 2 Manhattan or City-block Distance (L ): sum of axis-parallel line segments d x 1, x 2 = A d A v 1, A, v 2, A other choices possible important in higher dimensional spaces: do we care about deviations in all dimensions or primarily the biggest? IW19 Data Mining and Machine Learning: Techniques and Algorithms 15 Steven Skiena: CSE 519

Distance Functions for Numerical Attributes Numerical Attributes: distance between two attribute values Normalization: Different attributes are measured on different scales values need to be normalized in [0,1]: Note: This normalization assumes a (roughly) uniform distribution of attribute values For other distributions, other normalizations might be preferable e.g.: logarithmic for salaries? d A v 1, v 2 = v 1 v 2 v i = v i min v j max v j min v j IW19 Data Mining and Machine Learning: Techniques and Algorithms 16

Distance Functions for Symbolic Attributes 0/1 distance d A v 1, v 2 ={ 0 if v 1=v 2 1 if v 1 v 2 Value Difference Metric (VDM) (Stanfill & Waltz 1986) two values are similar if they have approximately the same distribution over all classes (similar relative frequencies in all classes) IW19 Data Mining and Machine Learning: Techniques and Algorithms 17

Other Distance Functions Other distances are possible hierarchical attributes distance of the values in the hierarchy e.g., length of shortest path form v 1 to v 2 d (Canada, USA)=2, d (Canada,Japan)=4 IW19 Data Mining and Machine Learning: Techniques and Algorithms 21

Other Distance Functions Other distances are possible hierarchical attributes distance of the values in the hierarchy e.g., length of shortest path form v 1 to v 2 in general distances are domain-dependent can be chosen appropriately Distances for Missing Values not all attribute values may be specified for an example Common policy: assume missing values to be maximally distant IW19 Data Mining and Machine Learning: Techniques and Algorithms 23

Feature and Instance Weighting Feature Weighting Not all dimensions are equally important comparisons on some dimensions might even be completely irrelevant for the prediction task straight-forward distance functions give equal weight to all dimensions RELIEF: give higher weight to features which allow to better discriminate between classes (Kira & Rendell, ICML-92) Instance Weighting: we assign a weight to each instance instances with lower weights are always distant hence have a low impact on classification PEBLS (Cost & Salzberg, 1993): d ' ( x 1, x 2 )= 1 d ( x w x1 w 1, x 2 ) w x = x2 IW19 Data Mining and Machine Learning: Techniques and Algorithms 24 Number of times x has correctly predicted the class Number of times x has been used for prediction

Efficiency of NN algorithms very efficient in training only store the training data not so efficient in testing computation of distance measure to every training example much more expensive than, e.g., decision trees Note that k-nn and 1-NN are equal in terms of efficiency retrieving the k nearest neighbors is (almost) not more expensive than retrieving a single nearest neighbor k nearest neighbors can be maintained in a queue IW19 Data Mining and Machine Learning: Techniques and Algorithms 30

Finding nearest neighbors efficiently Simplest way of finding nearest neighbour: linear scan of the data classification takes time proportional to the product of the number of instances in training and test sets Nearest-neighbor search can be done more efficiently using appropriate data structures kd-trees ball trees IW19 Data Mining and Machine Learning: Techniques and Algorithms 31

kd-trees common setting (others possible) each level corresponds to one of the attributes order of attributes can be arbitrary, fixed, and cyclic each level splits according to its attribute ideally use the median value (results in balanced trees) often simply use the value of the next example attributes IW19 Data Mining and Machine Learning: Techniques and Algorithms 32

Using kd-trees: example The effect of a kd-tree is to partition the (multi-dimensional) sample space according to the underlying data distribution finer partitioning in regions with high density coarser partitioning in regions with low density For a given query point descending the tree to find the data points lying in the cell that contains the query point examine surrounding cells if they overlap the ball centered at the query point and the closest data point so far recursively back up one level and check distance to the split point if overlap also search other branch only a few cells have to be searched IW19 Data Mining and Machine Learning: Techniques and Algorithms 34

Using kd-trees: example Assume we have example [1,5] Unweighted Euclidian distance d e 1, e 2 = A d A e 1, e 2 2 sort the example down the tree: ends in the left successor of [4,7] compute distance to example in the leaf d [1,5],[4,7] = 1 4 2 5 7 2 = 13 now we have to look into rectangles that may contain a nearer example remember the difference to the closest example d min = 13. 1 7 5 4 1 4 d min IW19 Data Mining and Machine Learning: Techniques and Algorithms 35

Using kd-trees: example go up one level (to example [4,7]) compute distance to the closest point on this split (difference only on X) 1 7 d ([1,5], [4,*])= (4 1) 2 +0 2 =3 If the difference is smaller than the current best difference d ([1,5],[4,*])=3< 13=d min then we could have a closer example in the right subtree of [4,7] which in our case does not contain any example done. d min IW19 Data Mining and Machine Learning: Techniques and Algorithms 36

Using kd-trees: example go up one level (to example [5,4]) compute distance to the closest point on this split (difference only on Y) 1 7 d [1,5], [*,4] = 0 2 5 4 2 =1 if the difference is smaller than the current best difference d ([1,5],[*,4 ])=1< 13=d min then we could have a closer example in area Y < 4. go down the other branch and repeat recursively. d min IW19 Data Mining and Machine Learning: Techniques and Algorithms 37

Using kd-trees: example go down to leaf [2,3] compute distance to example in this leaf 1 7 d ([1,5], [2,3])= (1 2) 2 +(5 3) 2 = 5 if the difference is smaller than the current best difference d ([1,5],[2,3])= 5< 13=d min then the example in the leaf is the new nearest neighbor and d min = 5 13 this is recursively repeated until we have processed the root node no more distances have to be computed. d min d min IW19 Data Mining and Machine Learning: Techniques and Algorithms 38

Ball trees Problem in kd-trees: corners Observation: There is no need to make sure that regions don't overlap We can use balls (hyperspheres) instead of hyperrectangles A ball tree organizes the data into a tree of k-dimensional hyperspheres Normally allows for a better fit to the data and thus more efficient search IW19 Data Mining and Machine Learning: Techniques and Algorithms 39

Discussion Nearest Neighbor methods are often very accurate Assumes all attributes are equally important Remedy: attribute selection or weights Possible remedies against noisy instances Take a majority vote over the k nearest neighbors Removing noisy instances from dataset (difficult!) Statisticians have used k-nn since early 1950s If n and k/n 0, error approaches minimum can model arbitrary decision boundaries...but somewhat inefficient (at classification time) straight-forward application maybe too slow kd-trees become inefficient when number of attributes is too large (approximately > 10) Ball trees work well in higher-dimensional spaces several similarities with rule learning IW19 Data Mining and Machine Learning: Techniques and Algorithms 50