Probabilistic Classifiers DWML, /27

Size: px
Start display at page:

Download "Probabilistic Classifiers DWML, /27"

Transcription

1 Probabilistic Classifiers DWML, /27

2 Probabilistic Classifiers Conditional class probabilities Id. Savings Assets Income Credit risk 1 Medium High 75 Good 2 Low Low 50 Bad 3 High Medium 25 Bad 4 Medium High 75 Good 5 Low Medium 100 Good 6 High High 25 Good 7 Medium High 75 Bad 8 Medium Medium 75 Good DWML, /27

3 Probabilistic Classifiers Conditional class probabilities Id. Savings Assets Income Credit risk 1 Medium High 75 Good 2 Low Low 50 Bad 3 High Medium 25 Bad 4 Medium High 75 Good 5 Low Medium 100 Good 6 High High 25 Good 7 Medium High 75 Bad 8 Medium Medium 75 Good P(Risk = Good Savings = Medium, Assets = High, Income = 75) = 2/3 P(Risk = Bad Savings = Medium, Assets = High, Income = 75) = 1/3 DWML, /27

4 Probabilistic Classifiers Empirical Distribution The training data defines the empirircal distribution, which can be represented in a table. Empirical distribution obtained from 1000 data instances: Gender Blood Pressure Weight Smoker Stroke P m low under no no 32/1000 m low under no yes 1/1000 m low under yes no 27/ f normal normal no yes 0/ f high over yes yes 54/1000 Such a table is not a suitable probabilistic model, because Size of representation It overfits the data DWML, /27

5 Probabilistic Classifiers Model View data as being produced by a random process that is described by a joint probability distribution P on States(A 1,..., A n, C), i.e. P assigns a probability P(a 1,..., a n, c) [0,1] to every tuple (a 1,..., a n, c) of values for the attribute and class variables, s.t. X (a 1,...,a n,c) States(A 1,...,A n,c) P(a 1,..., a n, c) = 1 (for discrete attributes; integration instead of summation for continuous attributes) Conditional Probability The joint distribution P also defines the conditional probability distribution of C, given A 1,..., A n, i.e. values P(c a 1,..., a n ) := P(a 1,..., a n, c) P(a 1,..., a n ) = P(a 1,..., a n, c) P c P(a 1,..., a n, c ) that represent the probability that C = c given that it is known that A 1 = a 1,..., A n = a n. DWML, /27

6 Probabilistic Classifiers Classification Rule C(a 1,..., a n ) := arg max c States(C) P(c a 1,..., a n ) In binary case, e.g. States(C) = {not infected, infected}, also with variable threshold t: C(a 1,..., a n ) = not infected : P(not infected a 1,..., a n ) t. (this can also be generalized for non-binary attributes). DWML, /27

7 Naive Bayes The Naive Bayes Model Structural assumption: P(a 1,..., a n, c) = P(a 1 c) P(a 2 c) P(a n c) P(c) Graphical representation as a Bayesian Network: C A 1 A 2 A 3 A 4 A 5 A 6 A 7 Interpretation: Given the true class labels, the different attributes take their value independently. DWML, /27

8 Naive Bayes The naive Bayes assumption I For example: P(Cell-2 = b Cell-5 = b, Symbol = 1) > P(Cell-2 = b Symbol = 1) Attributes not independent given Symbol=1! DWML, /27

9 Naive Bayes The naive Bayes assumption II For spam example e.g.: P(Body nigeria =y Body confidential =y, Spam=y) P(Body nigeria =y Spam=y) Attributes not independent given Spam=yes! Naive Bayes assumption often not realistic. Nevertheless, Naive Bayes often successful. DWML, /27

10 Naive Bayes Learning a Naive Bayes Classifier Determine parameters P(a i c) (a i States(A i ), c States(C)) from empirical counts in the data. Missing values are easily handled: instances for which A i is missing are ignored for P(a i c). Discrete and continuous attributes can be mixed. DWML, /27

11 Naive Bayes The paradoxical success of Naive Bayes One explanation for the surprisingly good performance of Naive Bayes in many domains: do not require exact distribution for classification, only the right decision boundaries [Domingos, Pazzani 97] 1 : P(C = a 1,..., a n ) (real) States(A 1,..., A n ) DWML, /27

12 Naive Bayes The paradoxical success of Naive Bayes One explanation for the surprisingly good performance of Naive Bayes in many domains: do not require exact distribution for classification, only the right decision boundaries [Domingos, Pazzani 97] : P(C = a 1,..., a n ) (real) : P(C = a 1,..., a n ) (Naive Bayes) States(A 1,..., A n ) DWML, /27

13 Naive Bayes When Naive Bayes must fail No Naive Bayes Classifier can produce the following classification: because assume it did, then: A B Class yes yes yes no no yes no no 1. P(A = y )P(B = y )P() > P(A = y )P(B = y )P( ) 2. P(A = y )P(B = n )P( ) > P(A = y )P(B = n )P() 3. P(A = n )P(B = y )P( ) > P(A = n )P(B = y )P() 4. P(A = n )P(B = n )P() > P(A = n )P(B = n )P( ) DWML, /27

14 Naive Bayes Multiplying the four left sides and the four right sides of these inequalities: 4Y (left side of i.) > i=1 4Y (right side of i.) i=1 But this is false, because both products are actually equal. DWML, /27

15 Naive Bayes Tree Augmented Naive Bayes A 2 Model: all Bayesian network structures where A 7 - The class node is parent of each attribute node C A 3 - The substructure on the attribute nodes is a tree A 1 A 4 A 5 A 6 Learning TAN classifier: learning the tree structure and parameters. Optimal tree structure can be found efficiently (Chow, Liu 1968, Friedman et al. 1997). DWML, /27

16 Naive Bayes A B Class TAN classifier for yes yes yes no no yes no no : A C yes no C C A yes no B yes no yes no DWML, /27

17 Evaluating Classifiers DWML, /27

18 Evaluating Classifiers Validation Evaluation: estimate of the performance of a classifier on future data. Estimate obtained by measuring performance on validation set (distinct from test set used for parameter tuning!); or by cross-validation. Classification Error Classifier C (e.g. decision tree) is used to classify instances a 1,...,a N with true class labels c 1,..., c N. Class labels assigned by C : c 1,..., c N. Classification error: {i 1,..., N c i c i } /N DWML, /27

19 Evaluating Classifiers Expected Loss A more detailed picture is provided by the confusion matrix and a cost function: (e.g. for States(C) = {a, b, c} and n = 150): true predicted a b c a 45/150 4/150 3/150 b 2/150 39/150 1/150 c 3/150 7/150 46/150 true predicted a b c a b c Confusion matrix: Fractions of cases with true/predicted combination Loss matrix Expected Loss: X x,y {a,b,c} Confusion(x, y) Loss(x, y) When cost function given, try to minimize expected loss (minimizing classification error is special case for 0-1 loss: Loss(x, x) = 0 and Loss(x, y) = 1 for x y)! DWML, /27

20 Evaluating Classifiers Classifiers with Confidence Most classifiers (implicitly) provide a numeric measurement for the likelihood of class label c for instance a: Probabilistic classifier: Probability of c given a. Decision Tree: Frequency of label c (among training cases) in leaf reached by a. k-nearest-neighbor: Frequency of label c among k nearest neighbors of a. Neural Network: Output value of c output neuron given input a. DWML, /27

21 Evaluating Classifiers Quantiles For a given class label c sort instances according to decreasing confidence in c: Instance: a 3 a 5 a 1 a 7 a 8 a 4 a 2 a 10 a 6 a 9 P(c): DWML, /27

22 Evaluating Classifiers Quantiles For a given class label c sort instances according to decreasing confidence in c: Instance: a 3 a 5 a 1 a 7 a 8 a 4 a 2 a 10 a 6 a 9 P(c): The 40% quantile consists of the 40% of cases with highest confidence in c. DWML, /27

23 Evaluating Classifiers Quantiles For a given class label c sort instances according to decreasing confidence in c: Instance: a 3 a 5 a 1 a 7 a 8 a 4 a 2 a 10 a 6 a 9 P(c): c i = c: yes yes no yes yes no yes no no no The 40% quantile consists of the 40% of cases with highest confidence in c. Given the correct class labels, can compute accuracy in 40% quantile (3/4), and ratio of this accuracy and base rate of c label: Lift(40%, C, c) = 3/4 5/10 = 1.5 DWML, /27

24 Evaluating Classifiers Lift plotted for different quantiles: Lift Charts 2 Lift(C, c) DWML, /27

25 Evaluating Classifiers Lift Charts Lift plotted for different quantiles: Lift(C, c) Lift(C, c) Lift for a classifier C generating a perfect ordering: Instance: a 7 a 5 a 2 a 3 a 8 a 9 a 1 a 10 a 6 a 4 P(c): c i = c: yes yes yes yes yes no no no no no DWML, /27

26 Evaluating Classifiers Lift and Costs What is better predicting C = c for all instances in the 40% quantile (say lift=1.5), and C c for all others, or predicting C = c for all instances in the 60% quantile (say lift=1.333), and C c for all others? That depends on the cost function! First option will be better when wrong predictions of C = c are very expensive, second option will be better when wrong predictions of C c are very expensive. DWML, /27

27 Evaluating Classifiers ROC Space Confusion matrix for binary classification problems: True positive rate (tpr): tp/(tp + fn) False positive rate (fpr): fp/(fp + tn) true predicted pos neg pos true positives (tp) false positives(fp) neg false negatives(fn) true negatives (tn) Each classifier (applied to some dataset) defines a point in ROC space: 1 tpr 0 fpr 1 DWML, /27

28 Evaluating Classifiers ROC Space Confusion matrix for binary classification problems: True positive rate (tpr): tp/(tp + fn) False positive rate (fpr): fp/(fp + tn) true predicted pos neg pos true positives (tp) false positives(fp) neg false negatives(fn) true negatives (tn) Each classifier (applied to some dataset) defines a point in ROC space: 1 always classify positive tpr 0 fpr 1 DWML, /27

29 Evaluating Classifiers ROC Space Confusion matrix for binary classification problems: True positive rate (tpr): tp/(tp + fn) False positive rate (fpr): fp/(fp + tn) true predicted pos neg pos true positives (tp) false positives(fp) neg false negatives(fn) true negatives (tn) Each classifier (applied to some dataset) defines a point in ROC space: 1 always classify positive always classify negative tpr 0 fpr 1 DWML, /27

30 Evaluating Classifiers ROC Space Confusion matrix for binary classification problems: True positive rate (tpr): tp/(tp + fn) False positive rate (fpr): fp/(fp + tn) true predicted pos neg pos true positives (tp) false positives(fp) neg false negatives(fn) true negatives (tn) Each classifier (applied to some dataset) defines a point in ROC space: 1 always classify positive always classify negative tpr classify positive with probability q q 0 q fpr 1 DWML, /27

31 Evaluating Classifiers ROC Space Confusion matrix for binary classification problems: True positive rate (tpr): tp/(tp + fn) False positive rate (fpr): fp/(fp + tn) true predicted pos neg pos true positives (tp) false positives(fp) neg false negatives(fn) true negatives (tn) Each classifier (applied to some dataset) defines a point in ROC space: 1 always classify positive always classify negative tpr q classify positive with probability q perfect classification 0 q fpr 1 DWML, /27

32 Evaluating Classifiers Comparison One classifier is strictly better than another, if its tpr/fpr point is to the left and above in ROC space: 1 C 1 C 2 tpr C 3 0 fpr 1 C 1 better than C 2. C 3 incomparable with C 1 and C 2. DWML, /27

33 Evaluating Classifiers ROC curves Probabilistic classifiers (and many others) are parameterized by an acceptance threshold. Plotting the tpr/fpr values for all parameters (and a given dataset) gives a ROC curve: 1 tpr 0 fpr 1 DWML, /27

34 Evaluating Classifiers ROC curves Probabilistic classifiers (and many others) are parameterized by an acceptance threshold. Plotting the tpr/fpr values for all parameters (and a given dataset) gives a ROC curve: 1 tpr 0 fpr 1 Performance measure for (parameterized family) classifier: area under (ROC) curve (AUC). DWML, /27

35 Optimizing Predictive Performance Overfitting again Performance Measure Model parameter Future data Training data Possible performance measures: Misclassification rate Expected loss Model parameter: Pruning parameter for decision trees k in k-nearest neighbor AUC... Complexity of probabilistic model (e.g. Naive Bayes, TAN,... )... How do we determine the model performing best on future data? DWML, /27

36 Optimizing Predictive Performance Test Set Set aside part (e.g. one third) of the available data as a test set Learn models with different parameters using the remaining data as the training data Measure the performance of each learned model on the test set Choose parameter setting with best performance Learn final model with chosen parameter setting using the whole available data Problem: for small datasets cannot afford to set aside test set. DWML, /27

37 Optimizing Predictive Performance Cross Validation Partition the data into n subsets or folds (typically: n = 10). For each model parameter setting: - for i = 1 to n: learn a model using folds 1,..., i 1, i + 1,..., n as training data measure performance on fold i - model performance = average performance on the n test sets Choose parameter setting with best performance Learn final model with chosen parameter setting using the whole available data DWML, /27

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems

More information

List of Exercises: Data Mining 1 December 12th, 2015

List of Exercises: Data Mining 1 December 12th, 2015 List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts

More information

Data Mining and Knowledge Discovery Practice notes 2

Data Mining and Knowledge Discovery Practice notes 2 Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/11/16 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes

More information

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics

More information

Classification Algorithms in Data Mining

Classification Algorithms in Data Mining August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms

More information

DATA MINING LECTURE 11. Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier

DATA MINING LECTURE 11. Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier DATA MINING LECTURE 11 Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier What is a hipster? Examples of hipster look A hipster is defined by facial hair Hipster or Hippie?

More information

Evaluating Machine-Learning Methods. Goals for the lecture

Evaluating Machine-Learning Methods. Goals for the lecture Evaluating Machine-Learning Methods Mark Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from

More information

Data Mining Concepts & Techniques

Data Mining Concepts & Techniques Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology

More information

Logistic Regression: Probabilistic Interpretation

Logistic Regression: Probabilistic Interpretation Logistic Regression: Probabilistic Interpretation Approximate 0/1 Loss Logistic Regression Adaboost (z) SVM Solution: Approximate 0/1 loss with convex loss ( surrogate loss) 0-1 z = y w x SVM (hinge),

More information

Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

More information

Classification. Instructor: Wei Ding

Classification. Instructor: Wei Ding Classification Part II Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/004 1 Practical Issues of Classification Underfitting and Overfitting Missing Values Costs of Classification

More information

CS4491/CS 7265 BIG DATA ANALYTICS

CS4491/CS 7265 BIG DATA ANALYTICS CS4491/CS 7265 BIG DATA ANALYTICS EVALUATION * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Dr. Mingon Kang Computer Science, Kennesaw State University Evaluation for

More information

Building Classifiers using Bayesian Networks

Building Classifiers using Bayesian Networks Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance

More information

Part II: A broader view

Part II: A broader view Part II: A broader view Understanding ML metrics: isometrics, basic types of linear isometric plots linear metrics and equivalences between them skew-sensitivity non-linear metrics Model manipulation:

More information

DATA MINING LECTURE 9. Classification Basic Concepts Decision Trees Evaluation

DATA MINING LECTURE 9. Classification Basic Concepts Decision Trees Evaluation DATA MINING LECTURE 9 Classification Basic Concepts Decision Trees Evaluation What is a hipster? Examples of hipster look A hipster is defined by facial hair Hipster or Hippie? Facial hair alone is not

More information

Data Warehousing and Machine Learning

Data Warehousing and Machine Learning Data Warehousing and Machine Learning Introduction Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 47 What is Data Mining?? Introduction DWML Spring

More information

CS145: INTRODUCTION TO DATA MINING

CS145: INTRODUCTION TO DATA MINING CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data

More information

CLASSIFICATION JELENA JOVANOVIĆ. Web:

CLASSIFICATION JELENA JOVANOVIĆ.   Web: CLASSIFICATION JELENA JOVANOVIĆ Email: jeljov@gmail.com Web: http://jelenajovanovic.net OUTLINE What is classification? Binary and multiclass classification Classification algorithms Naïve Bayes (NB) algorithm

More information

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to

More information

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict

More information

Evaluating Classifiers

Evaluating Classifiers Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with

More information

CS 229 Midterm Review

CS 229 Midterm Review CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask

More information

Data Mining Classification: Bayesian Decision Theory

Data Mining Classification: Bayesian Decision Theory Data Mining Classification: Bayesian Decision Theory Lecture Notes for Chapter 2 R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification, 2nd ed. New York: Wiley, 2001. Lecture Notes for Chapter

More information

Evaluation of different biological data and computational classification methods for use in protein interaction prediction.

Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly

More information

CS 584 Data Mining. Classification 3

CS 584 Data Mining. Classification 3 CS 584 Data Mining Classification 3 Today Model evaluation & related concepts Additional classifiers Naïve Bayes classifier Support Vector Machine Ensemble methods 2 Model Evaluation Metrics for Performance

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2013/12/09 1 Practice plan 2013/11/11: Predictive data mining 1 Decision trees Evaluating classifiers 1: separate

More information

Evaluating Machine Learning Methods: Part 1

Evaluating Machine Learning Methods: Part 1 Evaluating Machine Learning Methods: Part 1 CS 760@UW-Madison Goals for the lecture you should understand the following concepts bias of an estimator learning curves stratified sampling cross validation

More information

Weka ( )

Weka (  ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised

More information

Classification Part 4

Classification Part 4 Classification Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Model Evaluation Metrics for Performance Evaluation How to evaluate

More information

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016 CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Admin Course add/drop deadline tomorrow. Assignment 1 is due Friday. Setup your CS undergrad account ASAP to use Handin: https://www.cs.ubc.ca/getacct

More information

Use of Synthetic Data in Testing Administrative Records Systems

Use of Synthetic Data in Testing Administrative Records Systems Use of Synthetic Data in Testing Administrative Records Systems K. Bradley Paxton and Thomas Hager ADI, LLC 200 Canal View Boulevard, Rochester, NY 14623 brad.paxton@adillc.net, tom.hager@adillc.net Executive

More information

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data

More information

DATA MINING LECTURE 9. Classification Decision Trees Evaluation

DATA MINING LECTURE 9. Classification Decision Trees Evaluation DATA MINING LECTURE 9 Classification Decision Trees Evaluation 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium

More information

DATA MINING OVERFITTING AND EVALUATION

DATA MINING OVERFITTING AND EVALUATION DATA MINING OVERFITTING AND EVALUATION 1 Overfitting Will cover mechanisms for preventing overfitting in decision trees But some of the mechanisms and concepts will apply to other algorithms 2 Occam s

More information

Chapter 3: Supervised Learning

Chapter 3: Supervised Learning Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

Classification. Slide sources:

Classification. Slide sources: Classification Slide sources: Gideon Dror, Academic College of TA Yaffo Nathan Ifill, Leicester MA4102 Data Mining and Neural Networks Andrew Moore, CMU : http://www.cs.cmu.edu/~awm/tutorials 1 Outline

More information

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict

More information

Network Traffic Measurements and Analysis

Network Traffic Measurements and Analysis DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:

More information

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of

More information

CSE4334/5334 DATA MINING

CSE4334/5334 DATA MINING CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy

More information

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar (modified by Predrag Radivojac, 2017) Classification:

More information

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane DATA MINING AND MACHINE LEARNING Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Data preprocessing Feature normalization Missing

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

Tutorials Case studies

Tutorials Case studies 1. Subject Three curves for the evaluation of supervised learning methods. Evaluation of classifiers is an important step of the supervised learning process. We want to measure the performance of the classifier.

More information

Machine Learning. Classification

Machine Learning. Classification 10-701 Machine Learning Classification Inputs Inputs Inputs Where we are Density Estimator Probability Classifier Predict category Today Regressor Predict real no. Later Classification Assume we want to

More information

ECE 5470 Classification, Machine Learning, and Neural Network Review

ECE 5470 Classification, Machine Learning, and Neural Network Review ECE 5470 Classification, Machine Learning, and Neural Network Review Due December 1. Solution set Instructions: These questions are to be answered on this document which should be submitted to blackboard

More information

CISC 4631 Data Mining

CISC 4631 Data Mining CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside) 1 Classification:

More information

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes and a class attribute

More information

CREDIT RISK MODELING IN R. Finding the right cut-off: the strategy curve

CREDIT RISK MODELING IN R. Finding the right cut-off: the strategy curve CREDIT RISK MODELING IN R Finding the right cut-off: the strategy curve Constructing a confusion matrix > predict(log_reg_model, newdata = test_set, type = "response") 1 2 3 4 5 0.08825517 0.3502768 0.28632298

More information

Classification: Basic Concepts, Decision Trees, and Model Evaluation

Classification: Basic Concepts, Decision Trees, and Model Evaluation Classification: Basic Concepts, Decision Trees, and Model Evaluation Data Warehousing and Mining Lecture 4 by Hossen Asiful Mustafa Classification: Definition Given a collection of records (training set

More information

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016 CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Assignment 0: Admin 1 late day to hand it in tonight, 2 late days for Wednesday. Assignment 1 is out: Due Friday of next week.

More information

2. On classification and related tasks

2. On classification and related tasks 2. On classification and related tasks In this part of the course we take a concise bird s-eye view of different central tasks and concepts involved in machine learning and classification particularly.

More information

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.

.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label. .. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Definitions Data. Consider a set A = {A 1,...,A n } of attributes, and an additional

More information

Classification and Regression

Classification and Regression Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan

More information

Bayesian Networks Inference (continued) Learning

Bayesian Networks Inference (continued) Learning Learning BN tutorial: ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf TAN paper: http://www.cs.huji.ac.il/~nir/abstracts/frgg1.html Bayesian Networks Inference (continued) Learning Machine Learning

More information

CSE 5243 INTRO. TO DATA MINING

CSE 5243 INTRO. TO DATA MINING CSE 5243 INTRO. TO DATA MINING Classification (Basic Concepts) Huan Sun, CSE@The Ohio State University 09/12/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han Classification: Basic Concepts

More information

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,

More information

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4

More information

Data Mining D E C I S I O N T R E E. Matteo Golfarelli

Data Mining D E C I S I O N T R E E. Matteo Golfarelli Data Mining D E C I S I O N T R E E Matteo Golfarelli Decision Tree It is one of the most widely used classification techniques that allows you to represent a set of classification rules with a tree. Tree:

More information

Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules

Data Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 06/0/ Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms

More information

Machine Learning: Symbolische Ansätze

Machine Learning: Symbolische Ansätze Machine Learning: Symbolische Ansätze Evaluation and Cost-Sensitive Learning Evaluation Hold-out Estimates Cross-validation Significance Testing Sign test ROC Analysis Cost-Sensitive Evaluation ROC space

More information

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes

Part I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes Week 4 Based in part on slides from textbook, slides of Susan Holmes Part I Classification & Decision Trees October 19, 2012 1 / 1 2 / 1 Classification Classification Problem description We are given a

More information

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering

More information

Machine Learning and Bioinformatics 機器學習與生物資訊學

Machine Learning and Bioinformatics 機器學習與生物資訊學 Molecular Biomedical Informatics 分子生醫資訊實驗室 機器學習與生物資訊學 Machine Learning & Bioinformatics 1 Evaluation The key to success 2 Three datasets of which the answers must be known 3 Note on parameter tuning It

More information

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set. Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús

More information

Nächste Woche. Dienstag, : Vortrag Ian Witten (statt Vorlesung) Donnerstag, 4.12.: Übung (keine Vorlesung) IGD, 10h. 1 J.

Nächste Woche. Dienstag, : Vortrag Ian Witten (statt Vorlesung) Donnerstag, 4.12.: Übung (keine Vorlesung) IGD, 10h. 1 J. 1 J. Fürnkranz Nächste Woche Dienstag, 2. 12.: Vortrag Ian Witten (statt Vorlesung) IGD, 10h 4 Donnerstag, 4.12.: Übung (keine Vorlesung) 2 J. Fürnkranz Evaluation and Cost-Sensitive Learning Evaluation

More information

10 Classification: Evaluation

10 Classification: Evaluation CSE4334/5334 Data Mining 10 Classification: Evaluation Chengkai Li Department of Computer Science and Engineering University of Texas at Arlington Fall 2018 (Slides courtesy of Pang-Ning Tan, Michael Steinbach

More information

CS249: ADVANCED DATA MINING

CS249: ADVANCED DATA MINING CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal

More information

Statistics 202: Statistical Aspects of Data Mining

Statistics 202: Statistical Aspects of Data Mining Statistics 202: Statistical Aspects of Data Mining Professor Rajan Patel Lecture 9 = More of Chapter 5 Agenda: 1) Lecture over more of Chapter 5 1 Introduction to Data Mining by Tan, Steinbach, Kumar Chapter

More information

Nonparametric Methods Recap

Nonparametric Methods Recap Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority

More information

Part I. Instructor: Wei Ding

Part I. Instructor: Wei Ding Classification Part I Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Classification: Definition Given a collection of records (training set ) Each record contains a set

More information

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset. Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied

More information

MEASURING CLASSIFIER PERFORMANCE

MEASURING CLASSIFIER PERFORMANCE MEASURING CLASSIFIER PERFORMANCE ERROR COUNTING Error types in a two-class problem False positives (type I error): True label is -1, predicted label is +1. False negative (type II error): True label is

More information

Lecture Notes for Chapter 4

Lecture Notes for Chapter 4 Classification - Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web

More information

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning

ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets (Finish) Parameter Learning Structure Learning Readings: KF 18.1, 18.3; Barber 9.5,

More information

Machine Learning. Decision Trees. Manfred Huber

Machine Learning. Decision Trees. Manfred Huber Machine Learning Decision Trees Manfred Huber 2015 1 Decision Trees Classifiers covered so far have been Non-parametric (KNN) Probabilistic with independence (Naïve Bayes) Linear in features (Logistic

More information

Pattern recognition (4)

Pattern recognition (4) Pattern recognition (4) 1 Things we have discussed until now Statistical pattern recognition Building simple classifiers Supervised classification Minimum distance classifier Bayesian classifier (1D and

More information

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13 CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit

More information

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc. CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems Leigh M. Smith Humtap Inc. leigh@humtap.com Basic system overview Segmentation (Frames, Onsets, Beats, Bars, Chord Changes, etc) Feature

More information

Lecture 7: Decision Trees

Lecture 7: Decision Trees Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...

More information

CISC 4631 Data Mining

CISC 4631 Data Mining CISC 4631 Data Mining Lecture 05: Overfitting Evaluation: accuracy, precision, recall, ROC Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside)

More information

Interpretation and evaluation

Interpretation and evaluation Interpretation and evaluation 1. Descriptive tasks Evaluation based on novelty, interestingness, usefulness and understandability Qualitative evaluation: obvious (common sense) knowledge knowledge that

More information

Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support

Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Topics Validation of biomedical models Data-splitting Resampling Cross-validation

More information

3 Virtual attribute subsetting

3 Virtual attribute subsetting 3 Virtual attribute subsetting Portions of this chapter were previously presented at the 19 th Australian Joint Conference on Artificial Intelligence (Horton et al., 2006). Virtual attribute subsetting

More information

Exam Advanced Data Mining Date: Time:

Exam Advanced Data Mining Date: Time: Exam Advanced Data Mining Date: 11-11-2010 Time: 13.30-16.30 General Remarks 1. You are allowed to consult 1 A4 sheet with notes written on both sides. 2. Always show how you arrived at the result of your

More information

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality

More information

CS570: Introduction to Data Mining

CS570: Introduction to Data Mining CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8.4 & 8.5 Han, Chapters 4.5 & 4.6 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data

More information

Data Mining Classification - Part 1 -

Data Mining Classification - Part 1 - Data Mining Classification - Part 1 - Universität Mannheim Bizer: Data Mining I FSS2019 (Version: 20.2.2018) Slide 1 Outline 1. What is Classification? 2. K-Nearest-Neighbors 3. Decision Trees 4. Model

More information

Performance Evaluation of Various Classification Algorithms

Performance Evaluation of Various Classification Algorithms Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------

More information

The exam is closed book, closed notes except your one-page cheat sheet.

The exam is closed book, closed notes except your one-page cheat sheet. CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right

More information

SUPERVISED LEARNING WITH SCIKIT-LEARN. How good is your model?

SUPERVISED LEARNING WITH SCIKIT-LEARN. How good is your model? SUPERVISED LEARNING WITH SCIKIT-LEARN How good is your model? Classification metrics Measuring model performance with accuracy: Fraction of correctly classified samples Not always a useful metric Class

More information

Machine Problem 8 - Mean Field Inference on Boltzman Machine

Machine Problem 8 - Mean Field Inference on Boltzman Machine CS498: Applied Machine Learning Spring 2018 Machine Problem 8 - Mean Field Inference on Boltzman Machine Professor David A. Forsyth Auto-graded assignment Introduction Mean-Field Approximation is a useful

More information