Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Similar documents
k-nearest Neighbor (knn) Sept Youn-Hee Han

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Data Mining. Part 2. Data Understanding and Preparation. 2.4 Data Transformation. Spring Instructor: Dr. Masoud Yaghini. Data Transformation

Data Mining and Machine Learning: Techniques and Algorithms

Machine Learning and Pervasive Computing

Data Mining and Data Warehousing Classification-Lazy Learners

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

Data Mining Part 5. Prediction

Geometric data structures:

CS570: Introduction to Data Mining

Chapter 4: Algorithms CS 795

International Journal of Scientific Research & Engineering Trends Volume 4, Issue 6, Nov-Dec-2018, ISSN (Online): X

Topic 1 Classification Alternatives

Data Mining and Machine Learning. Instance-Based Learning. Rote Learning k Nearest-Neighbor Classification. IBL and Rule Learning

Chapter 4: Algorithms CS 795

K- Nearest Neighbors(KNN) And Predictive Accuracy

Classification. Instructor: Wei Ding

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Nearest Neighbor Predictors

Data Mining. Part 1. Introduction. 1.4 Input. Spring Instructor: Dr. Masoud Yaghini. Input

9 Classification: KNN and SVM

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

Problem 1: Complexity of Update Rules for Logistic Regression

Data Mining. Part 1. Introduction. 1.3 Input. Fall Instructor: Dr. Masoud Yaghini. Input

Nearest Neighbor Methods

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

CISC 4631 Data Mining

Distribution-free Predictive Approaches

CSE4334/5334 DATA MINING

Data Mining. Covering algorithms. Covering approach At each stage you identify a rule that covers some of instances. Fig. 4.

Using Machine Learning to Optimize Storage Systems

Data Preprocessing. Slides by: Shree Jaswal

Case-Based Reasoning. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. Parametric / Non-parametric.

CS 188: Artificial Intelligence Fall 2008

Part I. Instructor: Wei Ding

Data Mining. 1.3 Input. Fall Instructor: Dr. Masoud Yaghini. Chapter 3: Input

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Machine Learning - Clustering. CS102 Fall 2017

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

2. Data Preprocessing

Nearest Neighbor Classification. Machine Learning Fall 2017

CS 584 Data Mining. Classification 1

K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Search. The Nearest Neighbor Problem

Data Mining Practical Machine Learning Tools and Techniques. Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank

Supervised Learning: Nearest Neighbors

Slides for Data Mining by I. H. Witten and E. Frank

Jarek Szlichta

Applying Supervised Learning

Nearest neighbor classification DSE 220

Machine Perception of Music & Audio. Topic 10: Classification

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

Unsupervised Learning

11/2/2017 MIST.6060 Business Intelligence and Data Mining 1. Clustering. Two widely used distance metrics to measure the distance between two records

Machine Learning using MapReduce

Discriminate Analysis

Statistical Pattern Recognition

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

Machine Learning Classifiers and Boosting

Extra readings beyond the lecture slides are important:

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

Lecture 25: Review I

Data Preprocessing Yudho Giri Sucahyo y, Ph.D , CISA

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

Nearest Neighbor Classification

Data Preprocessing. Supervised Learning

Lecture 7: Decision Trees

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Hierarchical Clustering Lecture 9

Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

CSCI567 Machine Learning (Fall 2014)

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

Nearest neighbors. Focus on tree-based methods. Clément Jamin, GUDHI project, Inria March 2017

Based on Raymond J. Mooney s slides

Homework # 4. Example: Age in years. Answer: Discrete, quantitative, ratio. a) Year that an event happened, e.g., 1917, 1950, 2000.

Data Mining. SPSS Clementine k-means Algorithm. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Clustering Billions of Images with Large Scale Nearest Neighbor Search

COMP 465: Data Mining Classification Basics

Data Mining 4. Cluster Analysis

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Data Mining. Lecture 03: Nearest Neighbor Learning

Supervised Learning Classification Algorithms Comparison

Statistical Pattern Recognition

Instance-based Learning

CHAPTER 4 K-MEANS AND UCAM CLUSTERING ALGORITHM

Classification: Basic Concepts, Decision Trees, and Model Evaluation

Getting to Know Your Data

Data Mining and Knowledge Discovery: Practice Notes

Non-Bayesian Classifiers Part I: k-nearest Neighbor Classifier and Distance Functions

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Machine Learning. Nonparametric methods for Classification. Eric Xing , Fall Lecture 2, September 12, 2016

Transcription:

Data Mining 3.5 (Instance-Based Learners) Fall 2008 Instructor: Dr. Masoud Yaghini

Outline Introduction k-nearest-neighbor Classifiers References

Introduction

Introduction Lazy vs. eager learning Eager learning e.g. decision tree induction, Bayesian classification, rulebased classification Given a set of training set, constructs a classification model before receiving new (e.g., test) data to classify Lazy learning e.g., k-nearest-neighbor classifiers, case-based reasoning classifiers Simply stores training data (or only minor processing) and waits until it is given a new instance Lazy: less time in training but more time in predicting

Introduction Lazy learners store training examples and delay the processing ( lazy evaluation ) until a new instance must be classified Accuracy Lazy method effectively uses a richer hypothesis space since it uses many local linear functions to form its implicit global approximation to the target function Eager: must commit to a single hypothesis that covers the entire instance space

Example Problem: Face Recognition We have a database of (say) 1 million face images We are given a new image and want to find the most similar images in the database Represent faces by (relatively) invariant values, e.g., ratio of nose width to eye width Each image represented by a large number of numerical features Problem: given the features of a new face, find those in the DB that are close in at least ¾ (say) of the features

Introduction Typical approaches k-nearest neighbor approach Instances represented as points in a Euclidean space. Case-based reasoning Uses symbolic representations and knowledge-based inference

k-nearest-neighbor Classifiers

k-nearest-neighbor Classifiers All instances correspond to points in the n- dimentional space The training tuples are described by n attributes. Each tuple represents a point in an n-dimensional space. A k-nearest-neighbor classifier searches the pattern space for the k training tuples that are closest to the unknown tuple.

k-nearest-neighbor Classifiers Example: We are interested in classifying the type of drug a patient should be prescribed Based on the age of the patient and the patient s sodium/potassium ratio (Na/K) Dataset includes 200 patients

Scatter plot On the scatter plot; light gray points indicate drug Y; medium gray points indicate drug A or X; dark gray points indicate drug B or C

Close-up of neighbors to new patient 2 k=1 => drugs B and C (dark gray) k=2 =>? K=3 => drugs A and X (medium gray) Main questions: How many neighbors should we consider? That is, what is k? How do we measure distance? Should all points be weighted equally, or should some points have more influence than others?

k-nearest-neighbor Classifiers The nearest neighbor are defined in terms of Euclidean distance, dist(x 1, X 2 ) The Euclidean distance between two points or tuples, say, X1 = (x 11, x 12,, x 1n ) and X 2 = (x 21, x 22,..., x 2n ), is: Nominal attributes: distance either 0 or 1

k-nearest-neighbor Classifiers Typically, we normalize the values of each attribute in advanced. This helps prevent attributes with initially large ranges (such as income) from outweighing attributes with initially smaller ranges (such as binary attributes). Min-max normalization: all attribute values lie between 0 and 1

k-nearest-neighbor Classifiers Common policy for missing values: assumed to be maximally distant (given normalized attributes) Other popular metric: Manhattan (city-block) metric Taking absolute differences value without squaring them

k-nearest-neighbor Classifiers For k-nearest-neighbor classification, the unknown tuple is assigned the most common class among its k nearest neighbors. When k = 1, the unknown tuple is assigned the class of the training tuple that is closest to it in pattern space. Nearest-neighbor classifiers can also be used for prediction, that is, to return a real-valued prediction for a given unknown tuple. In this case, the classifier returns the average value of the real-valued labels associated with the k nearest neighbors of the unknown tuple.

Categorical Attributes A simple method is to compare the corresponding value of the attribute in tuple X1 with that in tuple X2. If the two are identical (e.g., tuples X1 and X2 both have the color blue), then the difference between the two is taken as 0, otherwise 1. Other methods may incorporate more sophisticated schemes for differential grading (e.g., where a difference score is assigned, say, for blue and white than for blue and black).

Missing Values In general, if the value of a given attribute A is missing in tuple X1 and/or in tuple X2, we assume the maximum possible difference. For categorical attributes, we take the difference value to be 1 if either one or both of the corresponding values of A are missing. If A is numeric and missing from both tuples X1 and X2, then the difference is also taken to be 1. If only one value is missing and the other (which we ll call v ) is present and normalized, then we can take the difference to be either 1 - v or 0 v, whichever is greater.

Determining a good value for k k can be determined experimentally. Starting with k = 1, we use a test set to estimate the error rate of the classifier. This process can be repeated each time by incrementing k to allow for one more neighbor. The k value that gives the minimum error rate may be selected. In general, the larger the number of training tuples is, the larger the value of k will be

Finding nearest neighbors efficiently Simplest way of finding nearest neighbor: linear scan of the data Classification takes time proportional to the product of the number of instances in training and test sets Nearest-neighbor search can be done more efficiently using appropriate data structures There two methods that represent training data in a tree structure: kd-trees (k-dimensional trees) Ball trees

kd-trees kd-tree is a binary tree that divides the input space with a hyperplane and then splits each partition again, recursively. The data structure is called a kd-tree because it stores a set of points in k-dimensional space, k being the number of attributes.

kd-tree example

Using kd-trees: example The target, which is not one of the instances in the tree, is marked by a star. The leaf node of the region containing the target is colored black. To determine whether one closer exists, first check whether it is possible for a closer neighbor to lie within the node s sibling. Then back up to the parent node and check its sibling

More on kd-trees Complexity depends on depth of tree Amount of backtracking required depends on quality of tree How to build a good tree? Need to find good split point and split direction Split direction: direction with greatest variance Split point: median value or value closest to mean along that direction Can apply this recursively

Building trees incrementally Big advantage of instance-based learning: classifier can be updated incrementally Just add new training instance! We can do the same with kd-trees Heuristic strategy: Find leaf node containing new instance Place instance into leaf if leaf is empty Otherwise, split leaf Tree should be rebuilt occasionally

References

References J. Han, M. Kamber, Data Mining: Concepts and Techniques, Elsevier Inc. (2006). (Chapter 6) I. H. Witten and E. Frank, Data Mining: Practical Machine Learning Tools and Techniques, 2nd Edition, Elsevier Inc., 2005. (Chapter 6)

The end