CS 584 Data Mining. Classification 1

Similar documents
CISC 4631 Data Mining

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

Data Mining. Lecture 03: Nearest Neighbor Learning

CISC 4631 Data Mining

9 Classification: KNN and SVM

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Data Preprocessing. Supervised Learning

CSE4334/5334 DATA MINING

Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification and Regression

Data Mining and Data Warehousing Classification-Lazy Learners

Part I. Instructor: Wei Ding

2. Basic Task of Pattern Classification

Topic 1 Classification Alternatives

Data Mining and Machine Learning: Techniques and Algorithms

CS 584 Data Mining. Classification 3

Naïve Bayes for text classification

Data Mining Concept. References. Why Mine Data? Commercial Viewpoint. Why Mine Data? Scientific Viewpoint

Machine Learning: Algorithms and Applications Mockup Examination

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

K- Nearest Neighbors(KNN) And Predictive Accuracy

k-nearest Neighbors + Model Selection

K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Data Mining Classification: Bayesian Decision Theory

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

CS570: Introduction to Data Mining

Data Mining Concepts & Techniques

k Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)

Introduction to Artificial Intelligence

CS6716 Pattern Recognition

Data Mining Classification - Part 1 -

Lecture 3. Oct

Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy

KTH ROYAL INSTITUTE OF TECHNOLOGY. Lecture 14 Machine Learning. K-means, knn

Based on Raymond J. Mooney s slides

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

CSE 494/598 Lecture-11: Clustering & Classification

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS178: Machine Learning and Data Mining. Complexity & Nearest Neighbor Methods

DATA MINING LECTURE 11. Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier

Data Mining and Machine Learning. Instance-Based Learning. Rote Learning k Nearest-Neighbor Classification. IBL and Rule Learning

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

Nearest Neighbor Classification. Machine Learning Fall 2017

Machine Learning Classifiers and Boosting

Lecture Notes for Chapter 5

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

Nearest Neighbor Predictors

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Data Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.

Linear Regression and K-Nearest Neighbors 3/28/18

Slides for Data Mining by I. H. Witten and E. Frank

Instance-based Learning CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2015

Instance-based Learning

Announcements:$Rough$Plan$$

Bayes Risk. Classifiers for Recognition Reading: Chapter 22 (skip 22.3) Discriminative vs Generative Models. Loss functions in classifiers

Classifiers for Recognition Reading: Chapter 22 (skip 22.3)

DATA MINING LECTURE 9. Classification Basic Concepts Decision Trees Evaluation

The Curse of Dimensionality

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

Extra readings beyond the lecture slides are important:

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes

Instance-based Learning

What to come. There will be a few more topics we will cover on supervised learning

CS Machine Learning

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Lecture 7: Decision Trees

Supervised Learning: Nearest Neighbors

Applying Supervised Learning

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

Machine Learning Techniques for Data Mining

Data Mining: Concepts and Techniques. Chapter 9 Classification: Support Vector Machines. Support Vector Machines (SVMs)

Introduction to Support Vector Machines

Lecture Notes for Chapter 5

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

DATA MINING LECTURE 9. Classification Decision Trees Evaluation

Instance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Nearest Neighbor Classification

Introduction to object recognition. Slides adapted from Fei-Fei Li, Rob Fergus, Antonio Torralba, and others

Example of DT Apply Model Example Learn Model Hunt s Alg. Measures of Node Impurity DT Examples and Characteristics. Classification.

Identification Of Iris Flower Species Using Machine Learning

STA 4273H: Statistical Machine Learning

Voronoi Region. K-means method for Signal Compression: Vector Quantization. Compression Formula 11/20/2013

CSCI567 Machine Learning (Fall 2014)

Statistical Methods in AI

Data Mining: Exploring Data. Lecture Notes for Chapter 3

Classification Algorithms in Data Mining

CSE 573: Artificial Intelligence Autumn 2010

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Nearest Neighbor Methods

Data Preprocessing. Slides by: Shree Jaswal

Data Mining: Exploring Data. Lecture Notes for Chapter 3. Introduction to Data Mining

Data Mining: Exploring Data. Lecture Notes for Data Exploration Chapter. Introduction to Data Mining

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Transcription:

CS 584 Data Mining Classification 1

Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen records should be assigned a class as accurately as possible. A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.

10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No Learning algorithm 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes Induction 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No Learn Model 10 No Small 90K Yes Training Set Tid Attrib1 Attrib2 Attrib3 Class Apply Model Model 11 No Small 55K? 12 Yes Medium 80K? 13 Yes Large 110K? Deduction 14 No Small 95K? 15 No Large 67K? Test Set

Examples of Classification Task Predicting tumor cells as benign or malignant Classifying credit card transactions as legitimate or fraudulent Categorizing news stories as finance, weather, entertainment, sports, etc

The Classification Problem (informal definition) Given a collection of annotated data. In this case 5 instances of Katydids and five of Grasshoppers, decide what type of insect the unlabeled example is. Katydids Grasshoppers Katydid or Grasshopper?

For any domain of interest, we can measure features Color {Green, Brown, Gray, Other} Has Wings? Abdomen Length Thorax Length Antennae Length Spiracle Diameter Leg Length Mandible Size

We can store features in a database. Insect ID Abdomen Length My_Collection Antennae Length Insect Class 1 2.7 5.5 Grasshopper 2 8.0 9.1 Katydid The classification problem can now be expressed as: Given a training database (My_Collection), predict the class label of a previously unseen instance 3 0.9 4.7 Grasshopper 4 1.1 3.1 Grasshopper 5 5.4 8.5 Katydid 6 2.9 1.9 Grasshopper 7 6.1 6.6 Katydid 8 0.5 1.0 Grasshopper 9 8.3 6.6 Katydid 10 8.1 4.7 Katydids previously unseen instance = 11 5.1 7.0??????

Grasshoppers Katydids 10 9 8 7 6 5 Antenna Length 4 3 2 1 1 2 3 4 5 6 7 8 9 10 Abdomen Length

Grasshoppers We will also use this lager dataset as a motivating example Katydids Antenna Length 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 Abdomen Length Each of these data objects are called exemplars (training) examples instances tuples

We will return to the previous slide in two minutes. In the meantime, we are going to play a quick game. I am going to show you some classification problems which were shown to pigeons! Let us see if you are as smart as a pigeon!

Pigeon Problem 1 Examples of class A Examples of class B 3 4 5 2.5 1.5 5 5 2 6 8 8 3 2.5 5 4.5 3

Pigeon Problem 1 Examples of class A Examples of class B What class is this object? 3 4 5 2.5 8 1.5 1.5 5 5 2 What about this one, A or B? 6 8 8 3 2.5 5 4.5 3 4.5 7

Pigeon Problem 1 Examples of class A Examples of class B This is a B! 3 4 5 2.5 8 1.5 1.5 5 6 8 2.5 5 5 2 8 3 4.5 3 Here is the rule. If the left bar is smaller than the right bar, it is an A, otherwise it is a B.

Pigeon Problem 2 Examples of class A Examples of class B Oh! This ones hard! 4 4 5 2.5 8 1.5 5 5 2 5 Even I know this one 6 6 5 3 3 3 2.5 3 7 7

Pigeon Problem 2 Examples of class A 4 4 5 5 Examples of class B 5 2.5 2 5 The rule is as follows, if the two bars are equal sizes, it is an A. Otherwise it is a B. So this one is an A. 6 6 5 3 3 3 2.5 3 7 7

Pigeon Problem 3 Examples of class A Examples of class B 6 6 4 4 5 6 This one is really hard! What is this, A or B? 1 5 7 5 6 3 4 8 3 7 7 7

Pigeon Problem 3 It is a B! Examples of class A Examples of class B 6 6 4 4 1 5 6 3 5 6 7 5 4 8 The rule is as follows, if the square of the sum of the two bars is less than or equal to 100, it is an A. Otherwise it is a B. 3 7 7 7

Why did we spend so much time with this game? Because we wanted to show that almost all classification problems have a geometric interpretation, check out the next 3 slides

Pigeon Problem 1 Examples of class A 3 4 Examples of class B 5 2.5 Left Bar 10 9 8 7 6 5 4 3 2 1 1.5 5 6 8 5 2 8 3 1 2 3 4 5 6 7 8 9 10 Right Bar Here is the rule again. If the left bar is smaller than the right bar, it is an A, otherwise it is a B. 2.5 5 4.5 3

Pigeon Problem 2 Examples of class A 4 4 Examples of class B 5 2.5 Left Bar 10 9 8 7 6 5 4 3 2 1 5 5 6 6 2 5 5 3 1 2 3 4 5 6 7 8 9 10 Right Bar Let me look it up here it is.. the rule is, if the two bars are equal sizes, it is an A. Otherwise it is a B. 3 3 2.5 3

Pigeon Problem 3 Examples of class A 4 4 Examples of class B 5 6 Left Bar 100 90 80 70 60 50 40 30 20 10 1 5 7 5 10 20 30 40 50 60 70 80 90 100 Right Bar 6 3 3 7 4 8 7 7 The rule again: if the square of the sum of the two bars is less than or equal to 100, it is an A. Otherwise it is a B.

Grasshoppers Katydids 10 9 8 7 6 5 Antenna Length 4 3 2 1 1 2 3 4 5 6 7 8 9 10 Abdomen Length

previously unseen instance = 11 5.1 7.0?????? Antenna Length 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 Abdomen Length We can project the previously unseen instance into the same space as the database. We have now abstracted away the details of our particular problem. It will be much easier to talk about points in space. Katydids Grasshoppers

Simple Linear Classifier (Linear Discriminant Analysis) 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 If previously unseen instance above the line then class is Katydid else class is Grasshopper Katydids Grasshoppers R.A. Fisher 1890-1962

The simple linear classifier is defined for higher dimensional spaces

we can visualize it as being an n-dimensional hyperplane

It is interesting to think about what would happen in this example if we did not have the 3 rd dimension

We can no longer get perfect accuracy with the simple linear classifier We could try to solve this problem by user a simple quadratic classifier or a simple cubic classifier.. However, as we will later see, this is probably a bad idea

Which of the Pigeon Problems can be solved by the Simple Linear Classifier? Perfect Useless Pretty Good 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 Problems that can be solved by a linear classifier are called linearly separable. 100 90 80 70 60 50 40 30 20 10 10 20 30 40 50 60 70 80 90 100 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10

A Famous Problem R. A. Fisher s Iris Dataset. Virginica 3 classes 50 of each class The task is to classify Iris plants into one of 3 varieties using the Petal Length and Petal Width. Setosa Versicolor Iris Setosa Iris Versicolor Iris Virginica

We can generalize the piecewise linear classifier to N classes, by fitting N-1 lines. In this case we first learned the line to (perfectly) discriminate between Setosa and Virginica/Versicolor, then we learned to approximately discriminate between Virginica and Versicolor. Virginica Setosa Versicolor If petal width > 3.272 (0.325 * petal length) then class = Virginica Elseif petal width

Case Study: Sea Bass vs Salmon?

An Example Sorting incoming Fish on a conveyor according to species using optical sensing Species Sea bass Salmon

Problem Analysis Set up a camera and take some sample images to extract features Length Lightness Width Number and shape of fins Position of the mouth, etc

Classification Select the length of the fish as a possible feature for discrimination

The length is a poor feature alone! Select the lightness as a possible feature.

Threshold decision boundary and cost relationship Move our decision boundary toward smaller values of lightness in order to minimize the cost (reduce the number of sea bass that are classified salmon!) Task of decision theory

Adopt the lightness and add the width of the fish Fish x T = [x 1, x 2 ] Lightness Width

We might add other features that are not correlated with the ones we already have. A precaution should be taken not to reduce the performance by adding such noisy features Ideally, the best decision boundary should be the one which provides an optimal performance such as in the following figure:

However, our satisfaction is premature because the central aim of designing a classifier is to correctly classify novel input Issue of generalization!

Misclassifications Sea Bass misclassified as Salmon Salmon misclassified as Sea Bass

Cost sensitive classification Penalize misclassifications of one class more than the other Changes decision boundaries

What if? New decision boundary Salmon is more expensive than Bass? Bass is more expensive than Salmon? New decision boundary x* x*

Classification Techniques Instance-Based Classifiers Decision Tree based Methods Rule-based Methods Neural Networks Naïve Bayes and Bayesian Belief Networks Support Vector Machines

Instance-Based Classifiers Set of Stored Cases Atr1... AtrN Class A B B C A C B Store the training records Use training records to predict the class label of unseen cases Unseen Case Atr1... AtrN

Instance Based Classifiers Examples: Rote-learner Memorizes entire training data and performs classification only if attributes of record match one of the training examples exactly Nearest neighbor Uses the closest points (nearest neighbors) for performing classification

Nearest Neighbor Classifiers Basic idea: If it walks like a duck, quacks like a duck, then it s probably a duck Compute Distance Test Record Training Records Choose k of the nearest records

Nearest-Neighbor Classifiers Unknown record Requires three things The set of stored records Distance Metric to compute distance between records The value of k, the number of nearest neighbors to retrieve To classify an unknown record: Compute distance to other training records Identify k nearest neighbors Use class labels of nearest neighbors to determine the class label of unknown record (e.g., by taking majority vote)

Nearest Neighbor Classifiers Antenna Length 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 Abdomen Length Evelyn Fix 1904-1965 If the nearest instance to the previously unseen instance is a Katydid class is Katydid else class is Grasshopper Katydids Grasshoppers Joe Hodges 1922-2000

Nearest Neighbor Classifier is sensitive to outliers Antenna Length 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 Abdomen Length If the nearest instance to the previously unseen instance is a Katydid class is Katydid else class is Grasshopper Katydids Grasshoppers Solution: Use K nearest neighbors instead, and take majority vote!

Definition of Nearest Neighbor X X X (a) 1-nearest neighbor (b) 2-nearest neighbor (c) 3-nearest neighbor K-nearest neighbors of a record x are data points that have the k smallest distance to x

Voronoi Diagram 1-nearest-neighbor

Nearest Neighbor Classification Compute distance between two points: Euclidean distance d( p, q) = i ( p q i i ) 2 Determine the class from nearest neighbor list take the majority vote of class labels among the k- nearest neighbors Weigh the vote according to distance weight factor, w = 1/d 2

Nearest Neighbor Classification Choosing the value of k: If k is too small, sensitive to noise points If k is too large, neighborhood may include points from other classes What if we have a tie? X

The nearest neighbor algorithm is sensitive to irrelevant features Suppose the following is true, if an insects antenna is longer than 5.5 it is a Katydid, otherwise it is a Grasshopper. Using just the antenna length we get perfect classification! Training data 6 5 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 3 2 1 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10 Suppose however, we add in an irrelevant feature, for example the insects mass. Using both the antenna length and the insects mass with the 1-NN algorithm we get the wrong classification!

How do we mitigate the nearest neighbor algorithms sensitivity to irrelevant features? Use more training instances Ask an expert what features are relevant to the task Use statistical tests to try to determine which features are useful Search over feature subsets

The nearest neighbor algorithm is sensitive to the units of measurement X axis measured in centimeters Y axis measure in dollars The nearest neighbor to the pink unknown instance is red. X axis measured in millimeters Y axis measure in dollars The nearest neighbor to the pink unknown instance is blue. One solution is to normalize the units to pure numbers.

Scaling Issues Attributes may have to be scaled to prevent distance measures from being dominated by one of the attributes Example: height of a person may vary from 1.5m to 1.8m weight of a person may vary from 90lb to 300lb income of a person may vary from $10K to $1M

Advantages/Disadvantages of Nearest Neighbor Advantages: Simple to implement Handles correlated features (Arbitrary class shapes) Defined for any distance measure Handles streaming data trivially Disadvantages: Very sensitive to irrelevant features. Slow classification time for large datasets Works best for real valued datasets Does not build a model explicitly Lazy learners, as opposed to eager learners like decision tree induction