Topic 1 Classification Alternatives

Similar documents
9 Classification: KNN and SVM

Data Mining and Data Warehousing Classification-Lazy Learners

Data mining. Classification k-nn Classifier. Piotr Paszek. (Piotr Paszek) Data mining k-nn 1 / 20

Data Mining Classification: Alternative Techniques. Lecture Notes for Chapter 4. Instance-Based Learning. Introduction to Data Mining, 2 nd Edition

CS570: Introduction to Data Mining

CISC 4631 Data Mining

CS 584 Data Mining. Classification 1

Nearest Neighbor Classifiers

Basic Data Mining Technique

K-Nearest Neighbour Classifier. Izabela Moise, Evangelos Pournaras, Dirk Helbing

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

The k-means Algorithm and Genetic Algorithm

CS7267 MACHINE LEARNING NEAREST NEIGHBOR ALGORITHM. Mingon Kang, PhD Computer Science, Kennesaw State University

Data Preprocessing. Supervised Learning

Data Mining. Lecture 03: Nearest Neighbor Learning

CS570: Introduction to Data Mining

PESIT- Bangalore South Campus Hosur Road (1km Before Electronic city) Bangalore

Naïve Bayes for text classification

Performance Analysis of Data Mining Classification Techniques

Lecture 3. Oct

UVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

Nearest Neighbor Classification. Machine Learning Fall 2017

COMPUTATIONAL INTELLIGENCE SEW (INTRODUCTION TO MACHINE LEARNING) SS18. Lecture 6: k-nn Cross-validation Regularization

UVA CS 4501: Machine Learning. Lecture 10: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Knowledge Discovery in Databases

k-nearest Neighbor (knn) Sept Youn-Hee Han

K- Nearest Neighbors(KNN) And Predictive Accuracy

Machine Learning nearest neighbors classification. Luigi Cerulo Department of Science and Technology University of Sannio

A Lazy Approach for Machine Learning Algorithms

CS6716 Pattern Recognition

Ensemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar

Information Management course

Data Cleaning and Prototyping Using K-Means to Enhance Classification Accuracy

Mathematics of Data. INFO-4604, Applied Machine Learning University of Colorado Boulder. September 5, 2017 Prof. Michael Paul

COMP 465: Data Mining Classification Basics

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

Classification Lecture Notes cse352. Neural Networks. Professor Anita Wasilewska

Instance-based Learning

Non-trivial extraction of implicit, previously unknown and potentially useful information from data

Lecture 6 K- Nearest Neighbors(KNN) And Predictive Accuracy

Data Mining and Machine Learning: Techniques and Algorithms

数据挖掘 Introduction to Data Mining

SCHEME OF TEACHING AND EXAMINATION B.E. (ISE) VIII SEMESTER (ACADEMIC YEAR )

Instance-based Learning

2. (a) Briefly discuss the forms of Data preprocessing with neat diagram. (b) Explain about concept hierarchy generation for categorical data.

Data Mining Download or Read Online ebook data mining in PDF Format From The Best User Guide Database

Mine Blood Donors Information through Improved K- Means Clustering Bondu Venkateswarlu 1 and Prof G.S.V.Prasad Raju 2

Introduction to Clustering

Introduction to Machine Learning Prof. Anirban Santara Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

CISC 4631 Data Mining

Domain Independent Prediction with Evolutionary Nearest Neighbors.

Unsupervised Learning

Jarek Szlichta

Keywords- Classification algorithm, Hypertensive, K Nearest Neighbor, Naive Bayesian, Data normalization

Missing Data. Where did it go?

DBSCAN. Presented by: Garrett Poppe

Supervised Learning: Nearest Neighbors

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 3

BBS654 Data Mining. Pinar Duygulu. Slides are adapted from Nazli Ikizler

CS246: Mining Massive Datasets Jure Leskovec, Stanford University

International Journal of Computer Engineering and Applications, Volume VIII, Issue III, Part I, December 14

Olmo S. Zavala Romero. Clustering Hierarchical Distance Group Dist. K-means. Center of Atmospheric Sciences, UNAM.

Machine Learning Classifiers and Boosting

Lecture Notes for Chapter 5

Cluster Analysis. Ying Shen, SSE, Tongji University

Unsupervised Learning I: K-Means Clustering

CHAPTER 4 DETECTION OF DISEASES IN PLANT LEAF USING IMAGE SEGMENTATION

Announcements:$Rough$Plan$$

Supervised Learning: K-Nearest Neighbors and Decision Trees

Data Mining. Introduction. Piotr Paszek. (Piotr Paszek) Data Mining DM KDD 1 / 44

A Program demonstrating Gini Index Classification

Data Mining: Concepts and Techniques. (3 rd ed.) Chapter 9 Classification: Advanced Methods

Slides for Data Mining by I. H. Witten and E. Frank

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

An Improvement of Centroid-Based Classification Algorithm for Text Classification

Cluster Analysis. Mu-Chun Su. Department of Computer Science and Information Engineering National Central University 2003/3/11 1

Announcements. CS 188: Artificial Intelligence Spring Generative vs. Discriminative. Classification: Feature Vectors. Project 4: due Friday.

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Statistical Methods in AI

Fall Principles of Knowledge Discovery in Databases. University of Alberta

Decision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang

Distance based Clustering for Categorical Data

Data Mining Course Overview

Data Mining. Chapter 1: Introduction. Adapted from materials by Jiawei Han, Micheline Kamber, and Jian Pei

Study on Classifiers using Genetic Algorithm and Class based Rules Generation

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1395

Oliver Dürr. Statistisches Data Mining (StDM) Woche 12. Institut für Datenanalyse und Prozessdesign Zürcher Hochschule für Angewandte Wissenschaften

Lecture 6 Classification and Prediction

ECE 5424: Introduction to Machine Learning

Based on Raymond J. Mooney s slides

Figure (5) Kohonen Self-Organized Map

Data Mining. Introduction. Hamid Beigy. Sharif University of Technology. Fall 1394

Machine Learning using MapReduce

STUDY PAPER ON CLASSIFICATION TECHIQUE IN DATA MINING

Jeff Howbert Introduction to Machine Learning Winter

10/5/2017 MIST.6060 Business Intelligence and Data Mining 1. Nearest Neighbors. In a p-dimensional space, the Euclidean distance between two records,

Transcription:

Topic 1 Classification Alternatives [Jiawei Han, Micheline Kamber, Jian Pei. 2011. Data Mining Concepts and Techniques. 3 rd Ed. Morgan Kaufmann. ISBN: 9380931913.] 1

Contents 2. Classification Using Frequent Patterns 3. Support Vector Machines (SVMs) 4. Classification by Backpropagation (ANNs) 5. Bayesian Belief Networks 6. Other Classification Methods 2

Introduction Basic Concepts Eager learning (e.g., decision tree) spends a lot of time for model building (training/learning). - Once a model has been built, classifying a test example is extremely fast. Lazy learning (e.g., k-nearest-neighbor classifier) does not require model building (no training). - Classifying a test example is quite expensive because we need to compute the proximity values individually between the test and training examples. 3

When we want to classify an unknown (unseen) tuple, a k-nearest-neighbor (k-nn) classifier searches the pattern space for the k training tuples that are closest to the unknown tuple. These k training tuples are the k nearest neighbors of the unknown tuple. For k-nn classification, the unknown tuple is assigned the most common class among its k- nearest neighbors (i.e., majority class of its k nearest neighbors). 4

The 1-, 2-, and 3-nearest neighbors of an instance x. In (b), we may randomly choose one of class labels (i.e., + or ) to classify the data point x. 5

The Euclidean distance between two points or tuples X 1 = (x 11, x 12,..., x 1n ) and X 2 = (x 21, x 22,..., x 2n ) is defined as Other distance metrics (e.g., Manhattan, Minkowski, Cosine, and Mahalanobis distance) can be used. 6

The importance of choosing the right value for k. - If k is too small, then the k-nn classifier may be susceptible to overfitting because of noise in the training data. - If k is too large, the k-nn classifier may misclassify the test instance because its list of nearest neighbors may include data points that are located far away from its neighborhood, as shown below. 7

. k-nn classification with large k. (x is classified as instead of +) 8

Algorithm v1: Basic k-nn classification algorithm 1. Find the k training instances that are closest to the unseen instance. 2. Take the most commonly occurring class label of these k instances and assign it to the class label of the unseen instance. 9

Algorithm v2: Basic k-nn classification algorithm. 1. Let k be the number of nearest neighbors and D be the set of training examples. 2. for each test example z = (x, y ) do 3. Compute d(x, x), the distance between z and every example (x, y) D. 4. Select D z D, the set of k closest training examples to z. 5. y = 6. end for 10

Once the k-nn list D z is obtained, the test example is classified based on the majority class of its k nearest neighbors: where v is a class label, y i is the class label for one of the k nearest neighbors, and I( ) is an indicator function that returns the value 1 if its argument is true and 0 otherwise. 11

In the majority voting approach, every neighbor has the same impact on the classification. This makes the algorithm sensitive to the choice of k. 12

One way to reduce the impact of k is to weight the influence of each nearest neighbor x i according to its distance: w i = 1/d(x, x i ) 2. As a result, training examples that are located far away from z have a weaker impact on the classification compared to those that are located close to z. 13

Using the distance-weighted voting scheme, the class label can be determined as follows: 14

k-nn classifiers can produce wrong predictions due to varying scales of attribute values of tuples. For example, suppose we want to classify a group of people based on attributes such as height (measured in meters) and weight (measured in pounds). 15

The height attribute has a low variability, ranging from 1.5 m to 1.85 m, whereas the weight attribute may vary from 90 lb. to 250 lb. If the scale of the attributes are not taken into consideration, the proximity measure may be dominated by differences in the weights of a person. 16

Data normalization (aka feature scaling): We normalize the values of each attribute before computing proximity measure (e.g., Euclidean distance). - This helps prevent attributes with large ranges (e.g., weight) from outweighing attributes with smaller ranges (e.g., height). 17

Min-max normalization (aka unity-based normalization): can be used to transform a value v of a numeric attribute A to v in the range [0, 1] by computing v = (v min A ) / (max A min A ) [0, 1], where min A and max A are the minimum and maximum values of attribute A. 18

In general, min-max normalization (aka unitybased normalization): can be used to transform a value v of a numeric attribute A to v in the range [0, 1] by computing or v = l + [(v min A ) / (max A min A )] (u l) [l, u], where min A and max A are the minimum and maximum values of attribute A. 19

Note that it is possible that an unseen instance may have a value of A that is less than min or greater than max. If we want to keep the adjusted numbers in the range from 0 to 1, we can just convert any values of A that are less than min or greater than max to 0 or 1, respectively. 20

Dealing with non-numeric attributes: For nonnumeric attributes (e.g., nominal or categorical), a simple method is to compare the corresponding value of the non-numeric attribute in tuple X 1 with that in tuple X 2. - If the two are identical (e.g., tuples X 1 and X 2 both have the blue color), then the difference between the two is 0. - If the two are different (e.g., tuple X 1 is blue but tuple X 2 is red), then the difference is 1. 21

Contents 2. Classification Using Frequent Patterns 3. Support Vector Machines (SVMs) 4. Classification by Backpropagation (ANNs) 5. Bayesian Belief Networks 6. Other Classification Methods 22

2. Classification Using Frequent Patterns. 23

Contents 2. Classification Using Frequent Patterns 3. Support Vector Machines (SVMs) 4. Classification by Backpropagation (ANNs) 5. Bayesian Belief Networks 6. Other Classification Methods 24

3. Support Vector Machines (SVMs). 25

Contents 2. Classification Using Frequent Patterns 3. Support Vector Machines (SVMs) 4. Classification by Backpropagation (ANNs) 5. Bayesian Belief Networks 6. Other Classification Methods 26

4. Classification by Backpropagation (ANNs). 27

Contents 2. Classification Using Frequent Patterns 3. Support Vector Machines (SVMs) 4. Classification by Backpropagation (ANNs) 5. Bayesian Belief Networks 6. Other Classification Methods 28

5. Bayesian Belief Networks. 29

Contents 2. Classification Using Frequent Patterns 3. Support Vector Machines (SVMs) 4. Classification by Backpropagation (ANNs) 5. Bayesian Belief Networks 6. Other Classification Methods 30

6. Other Classification Methods Genetic Algorithms (GAs) Rough Set Approach Fuzzy Set Approach 31

Summary 32

Exercises. 33

References 1. Jiawei Han, Micheline Kamber, Jian Pei. 2011. Data Mining Concepts and Techniques. 3 rd Ed. Morgan Kaufmann. ISBN: 9380931913. 2. Pang-Ning Tan, Michael Steinbach, Vipin Kumar. 2005. Introduction to Data Mining. 1 st Ed. Pearson. ISBN: 0321321367. 3. Charu C. Aggarwal. 2015. Data Mining The Textbook. Springer. ISBN: 3319141414. 34

References 4. Nong Ye. 2013. Data Mining: Theories, Algorithms, and Examples. CRC Press. ISBN: 1439808384. 5. Uday Kamath, Krishna Choppella. 2017. Mastering Java Machine Learning. Packt Publishing. ISBN: 1785880519. 35

Extra Slides Distance Metrics 1. Euclidean distance between two points x = (x 1, x 2,..., x d ) and y = (y 1, y 2,..., y d ) is defined as (also denoted as L 2 (x, y), L 2 (x, y), x y 2, x y 2 ) https://en.wikipedia.org/wiki/euclidean_distance 36

Extra Slides Distance Metrics 2. Manhattan distance between two points x = (x 1, x 2,..., x d ) and y = (y 1, y 2,..., y d ) is defined as (the sum of the absolute differences of their Cartesian coordinates) (also denoted as L 1 (x, y), L 1 (x, y), x y 1, x y 1 ) https://en.wikipedia.org/wiki/taxicab_geometry 37

Extra Slides Distance Metrics 3. Minkowski distance between two points x = (x 1, x 2,..., x d ) and y = (y 1, y 2,..., y d ) is defined as (a generalization of both the Euclidean distance (p = 2) and the Manhattan distance (p = 1)) (also denoted as L p (x, y), L p (x, y)) https://en.wikipedia.org/wiki/minkowski_distance 38

Extra Slides Distance Metrics 4. Cosine distance between two points x = (x 1, x 2,..., x d ) and y = (y 1, y 2,..., y d ) is defined as - dot (or inner) product x y = - length (or magnitude) of a vector x is x = https://en.wikipedia.org/wiki/cosine_similarity 39

Extra Slides Distance Metrics 5. Mahalanobis distance between two points x = (x 1, x 2,..., x d ) and y = (y 1, y 2,..., y d ) is defined as where - S is a covariance matrix (also denoted as ). - S 1 is the inverse of S - x T is the transpose of x https://en.wikipedia.org/wiki/mahalanobis_distance 40

Extra Slides 41