Probabilistic Classifiers DWML, /27
|
|
- Garey Barton
- 6 years ago
- Views:
Transcription
1 Probabilistic Classifiers DWML, /27
2 Probabilistic Classifiers Conditional class probabilities Id. Savings Assets Income Credit risk 1 Medium High 75 Good 2 Low Low 50 Bad 3 High Medium 25 Bad 4 Medium High 75 Good 5 Low Medium 100 Good 6 High High 25 Good 7 Medium High 75 Bad 8 Medium Medium 75 Good DWML, /27
3 Probabilistic Classifiers Conditional class probabilities Id. Savings Assets Income Credit risk 1 Medium High 75 Good 2 Low Low 50 Bad 3 High Medium 25 Bad 4 Medium High 75 Good 5 Low Medium 100 Good 6 High High 25 Good 7 Medium High 75 Bad 8 Medium Medium 75 Good P(Risk = Good Savings = Medium, Assets = High, Income = 75) = 2/3 P(Risk = Bad Savings = Medium, Assets = High, Income = 75) = 1/3 DWML, /27
4 Probabilistic Classifiers Empirical Distribution The training data defines the empirircal distribution, which can be represented in a table. Empirical distribution obtained from 1000 data instances: Gender Blood Pressure Weight Smoker Stroke P m low under no no 32/1000 m low under no yes 1/1000 m low under yes no 27/ f normal normal no yes 0/ f high over yes yes 54/1000 Such a table is not a suitable probabilistic model, because Size of representation It overfits the data DWML, /27
5 Probabilistic Classifiers Model View data as being produced by a random process that is described by a joint probability distribution P on States(A 1,..., A n, C), i.e. P assigns a probability P(a 1,..., a n, c) [0,1] to every tuple (a 1,..., a n, c) of values for the attribute and class variables, s.t. X (a 1,...,a n,c) States(A 1,...,A n,c) P(a 1,..., a n, c) = 1 (for discrete attributes; integration instead of summation for continuous attributes) Conditional Probability The joint distribution P also defines the conditional probability distribution of C, given A 1,..., A n, i.e. values P(c a 1,..., a n ) := P(a 1,..., a n, c) P(a 1,..., a n ) = P(a 1,..., a n, c) P c P(a 1,..., a n, c ) that represent the probability that C = c given that it is known that A 1 = a 1,..., A n = a n. DWML, /27
6 Probabilistic Classifiers Classification Rule C(a 1,..., a n ) := arg max c States(C) P(c a 1,..., a n ) In binary case, e.g. States(C) = {not infected, infected}, also with variable threshold t: C(a 1,..., a n ) = not infected : P(not infected a 1,..., a n ) t. (this can also be generalized for non-binary attributes). DWML, /27
7 Naive Bayes The Naive Bayes Model Structural assumption: P(a 1,..., a n, c) = P(a 1 c) P(a 2 c) P(a n c) P(c) Graphical representation as a Bayesian Network: C A 1 A 2 A 3 A 4 A 5 A 6 A 7 Interpretation: Given the true class labels, the different attributes take their value independently. DWML, /27
8 Naive Bayes The naive Bayes assumption I For example: P(Cell-2 = b Cell-5 = b, Symbol = 1) > P(Cell-2 = b Symbol = 1) Attributes not independent given Symbol=1! DWML, /27
9 Naive Bayes The naive Bayes assumption II For spam example e.g.: P(Body nigeria =y Body confidential =y, Spam=y) P(Body nigeria =y Spam=y) Attributes not independent given Spam=yes! Naive Bayes assumption often not realistic. Nevertheless, Naive Bayes often successful. DWML, /27
10 Naive Bayes Learning a Naive Bayes Classifier Determine parameters P(a i c) (a i States(A i ), c States(C)) from empirical counts in the data. Missing values are easily handled: instances for which A i is missing are ignored for P(a i c). Discrete and continuous attributes can be mixed. DWML, /27
11 Naive Bayes The paradoxical success of Naive Bayes One explanation for the surprisingly good performance of Naive Bayes in many domains: do not require exact distribution for classification, only the right decision boundaries [Domingos, Pazzani 97] 1 : P(C = a 1,..., a n ) (real) States(A 1,..., A n ) DWML, /27
12 Naive Bayes The paradoxical success of Naive Bayes One explanation for the surprisingly good performance of Naive Bayes in many domains: do not require exact distribution for classification, only the right decision boundaries [Domingos, Pazzani 97] : P(C = a 1,..., a n ) (real) : P(C = a 1,..., a n ) (Naive Bayes) States(A 1,..., A n ) DWML, /27
13 Naive Bayes When Naive Bayes must fail No Naive Bayes Classifier can produce the following classification: because assume it did, then: A B Class yes yes yes no no yes no no 1. P(A = y )P(B = y )P() > P(A = y )P(B = y )P( ) 2. P(A = y )P(B = n )P( ) > P(A = y )P(B = n )P() 3. P(A = n )P(B = y )P( ) > P(A = n )P(B = y )P() 4. P(A = n )P(B = n )P() > P(A = n )P(B = n )P( ) DWML, /27
14 Naive Bayes Multiplying the four left sides and the four right sides of these inequalities: 4Y (left side of i.) > i=1 4Y (right side of i.) i=1 But this is false, because both products are actually equal. DWML, /27
15 Naive Bayes Tree Augmented Naive Bayes A 2 Model: all Bayesian network structures where A 7 - The class node is parent of each attribute node C A 3 - The substructure on the attribute nodes is a tree A 1 A 4 A 5 A 6 Learning TAN classifier: learning the tree structure and parameters. Optimal tree structure can be found efficiently (Chow, Liu 1968, Friedman et al. 1997). DWML, /27
16 Naive Bayes A B Class TAN classifier for yes yes yes no no yes no no : A C yes no C C A yes no B yes no yes no DWML, /27
17 Evaluating Classifiers DWML, /27
18 Evaluating Classifiers Validation Evaluation: estimate of the performance of a classifier on future data. Estimate obtained by measuring performance on validation set (distinct from test set used for parameter tuning!); or by cross-validation. Classification Error Classifier C (e.g. decision tree) is used to classify instances a 1,...,a N with true class labels c 1,..., c N. Class labels assigned by C : c 1,..., c N. Classification error: {i 1,..., N c i c i } /N DWML, /27
19 Evaluating Classifiers Expected Loss A more detailed picture is provided by the confusion matrix and a cost function: (e.g. for States(C) = {a, b, c} and n = 150): true predicted a b c a 45/150 4/150 3/150 b 2/150 39/150 1/150 c 3/150 7/150 46/150 true predicted a b c a b c Confusion matrix: Fractions of cases with true/predicted combination Loss matrix Expected Loss: X x,y {a,b,c} Confusion(x, y) Loss(x, y) When cost function given, try to minimize expected loss (minimizing classification error is special case for 0-1 loss: Loss(x, x) = 0 and Loss(x, y) = 1 for x y)! DWML, /27
20 Evaluating Classifiers Classifiers with Confidence Most classifiers (implicitly) provide a numeric measurement for the likelihood of class label c for instance a: Probabilistic classifier: Probability of c given a. Decision Tree: Frequency of label c (among training cases) in leaf reached by a. k-nearest-neighbor: Frequency of label c among k nearest neighbors of a. Neural Network: Output value of c output neuron given input a. DWML, /27
21 Evaluating Classifiers Quantiles For a given class label c sort instances according to decreasing confidence in c: Instance: a 3 a 5 a 1 a 7 a 8 a 4 a 2 a 10 a 6 a 9 P(c): DWML, /27
22 Evaluating Classifiers Quantiles For a given class label c sort instances according to decreasing confidence in c: Instance: a 3 a 5 a 1 a 7 a 8 a 4 a 2 a 10 a 6 a 9 P(c): The 40% quantile consists of the 40% of cases with highest confidence in c. DWML, /27
23 Evaluating Classifiers Quantiles For a given class label c sort instances according to decreasing confidence in c: Instance: a 3 a 5 a 1 a 7 a 8 a 4 a 2 a 10 a 6 a 9 P(c): c i = c: yes yes no yes yes no yes no no no The 40% quantile consists of the 40% of cases with highest confidence in c. Given the correct class labels, can compute accuracy in 40% quantile (3/4), and ratio of this accuracy and base rate of c label: Lift(40%, C, c) = 3/4 5/10 = 1.5 DWML, /27
24 Evaluating Classifiers Lift plotted for different quantiles: Lift Charts 2 Lift(C, c) DWML, /27
25 Evaluating Classifiers Lift Charts Lift plotted for different quantiles: Lift(C, c) Lift(C, c) Lift for a classifier C generating a perfect ordering: Instance: a 7 a 5 a 2 a 3 a 8 a 9 a 1 a 10 a 6 a 4 P(c): c i = c: yes yes yes yes yes no no no no no DWML, /27
26 Evaluating Classifiers Lift and Costs What is better predicting C = c for all instances in the 40% quantile (say lift=1.5), and C c for all others, or predicting C = c for all instances in the 60% quantile (say lift=1.333), and C c for all others? That depends on the cost function! First option will be better when wrong predictions of C = c are very expensive, second option will be better when wrong predictions of C c are very expensive. DWML, /27
27 Evaluating Classifiers ROC Space Confusion matrix for binary classification problems: True positive rate (tpr): tp/(tp + fn) False positive rate (fpr): fp/(fp + tn) true predicted pos neg pos true positives (tp) false positives(fp) neg false negatives(fn) true negatives (tn) Each classifier (applied to some dataset) defines a point in ROC space: 1 tpr 0 fpr 1 DWML, /27
28 Evaluating Classifiers ROC Space Confusion matrix for binary classification problems: True positive rate (tpr): tp/(tp + fn) False positive rate (fpr): fp/(fp + tn) true predicted pos neg pos true positives (tp) false positives(fp) neg false negatives(fn) true negatives (tn) Each classifier (applied to some dataset) defines a point in ROC space: 1 always classify positive tpr 0 fpr 1 DWML, /27
29 Evaluating Classifiers ROC Space Confusion matrix for binary classification problems: True positive rate (tpr): tp/(tp + fn) False positive rate (fpr): fp/(fp + tn) true predicted pos neg pos true positives (tp) false positives(fp) neg false negatives(fn) true negatives (tn) Each classifier (applied to some dataset) defines a point in ROC space: 1 always classify positive always classify negative tpr 0 fpr 1 DWML, /27
30 Evaluating Classifiers ROC Space Confusion matrix for binary classification problems: True positive rate (tpr): tp/(tp + fn) False positive rate (fpr): fp/(fp + tn) true predicted pos neg pos true positives (tp) false positives(fp) neg false negatives(fn) true negatives (tn) Each classifier (applied to some dataset) defines a point in ROC space: 1 always classify positive always classify negative tpr classify positive with probability q q 0 q fpr 1 DWML, /27
31 Evaluating Classifiers ROC Space Confusion matrix for binary classification problems: True positive rate (tpr): tp/(tp + fn) False positive rate (fpr): fp/(fp + tn) true predicted pos neg pos true positives (tp) false positives(fp) neg false negatives(fn) true negatives (tn) Each classifier (applied to some dataset) defines a point in ROC space: 1 always classify positive always classify negative tpr q classify positive with probability q perfect classification 0 q fpr 1 DWML, /27
32 Evaluating Classifiers Comparison One classifier is strictly better than another, if its tpr/fpr point is to the left and above in ROC space: 1 C 1 C 2 tpr C 3 0 fpr 1 C 1 better than C 2. C 3 incomparable with C 1 and C 2. DWML, /27
33 Evaluating Classifiers ROC curves Probabilistic classifiers (and many others) are parameterized by an acceptance threshold. Plotting the tpr/fpr values for all parameters (and a given dataset) gives a ROC curve: 1 tpr 0 fpr 1 DWML, /27
34 Evaluating Classifiers ROC curves Probabilistic classifiers (and many others) are parameterized by an acceptance threshold. Plotting the tpr/fpr values for all parameters (and a given dataset) gives a ROC curve: 1 tpr 0 fpr 1 Performance measure for (parameterized family) classifier: area under (ROC) curve (AUC). DWML, /27
35 Optimizing Predictive Performance Overfitting again Performance Measure Model parameter Future data Training data Possible performance measures: Misclassification rate Expected loss Model parameter: Pruning parameter for decision trees k in k-nearest neighbor AUC... Complexity of probabilistic model (e.g. Naive Bayes, TAN,... )... How do we determine the model performing best on future data? DWML, /27
36 Optimizing Predictive Performance Test Set Set aside part (e.g. one third) of the available data as a test set Learn models with different parameters using the remaining data as the training data Measure the performance of each learned model on the test set Choose parameter setting with best performance Learn final model with chosen parameter setting using the whole available data Problem: for small datasets cannot afford to set aside test set. DWML, /27
37 Optimizing Predictive Performance Cross Validation Partition the data into n subsets or folds (typically: n = 10). For each model parameter setting: - for i = 1 to n: learn a model using folds 1,..., i 1, i + 1,..., n as training data measure performance on fold i - model performance = average performance on the n test sets Choose parameter setting with best performance Learn final model with chosen parameter setting using the whole available data DWML, /27
Data Mining Classification: Alternative Techniques. Imbalanced Class Problem
Data Mining Classification: Alternative Techniques Imbalanced Class Problem Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Class Imbalance Problem Lots of classification problems
More informationList of Exercises: Data Mining 1 December 12th, 2015
List of Exercises: Data Mining 1 December 12th, 2015 1. We trained a model on a two-class balanced dataset using five-fold cross validation. One person calculated the performance of the classifier by measuring
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationEvaluating Classifiers
Evaluating Classifiers Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website) Evaluating Classifiers What we want: Classifier that best predicts
More informationData Mining and Knowledge Discovery Practice notes 2
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/11/16 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 8.11.2017 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationCS6375: Machine Learning Gautam Kunapuli. Mid-Term Review
Gautam Kunapuli Machine Learning Data is identically and independently distributed Goal is to learn a function that maps to Data is generated using an unknown function Learn a hypothesis that minimizes
More informationEvaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München
Evaluation Measures Sebastian Pölsterl Computer Aided Medical Procedures Technische Universität München April 28, 2015 Outline 1 Classification 1. Confusion Matrix 2. Receiver operating characteristics
More informationClassification Algorithms in Data Mining
August 9th, 2016 Suhas Mallesh Yash Thakkar Ashok Choudhary CIS660 Data Mining and Big Data Processing -Dr. Sunnie S. Chung Classification Algorithms in Data Mining Deciding on the classification algorithms
More informationDATA MINING LECTURE 11. Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier
DATA MINING LECTURE 11 Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier What is a hipster? Examples of hipster look A hipster is defined by facial hair Hipster or Hippie?
More informationEvaluating Machine-Learning Methods. Goals for the lecture
Evaluating Machine-Learning Methods Mark Craven and David Page Computer Sciences 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some of the slides in these lectures have been adapted/borrowed from
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationLogistic Regression: Probabilistic Interpretation
Logistic Regression: Probabilistic Interpretation Approximate 0/1 Loss Logistic Regression Adaboost (z) SVM Solution: Approximate 0/1 loss with convex loss ( surrogate loss) 0-1 z = y w x SVM (hinge),
More informationData Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationClassification. Instructor: Wei Ding
Classification Part II Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/004 1 Practical Issues of Classification Underfitting and Overfitting Missing Values Costs of Classification
More informationCS4491/CS 7265 BIG DATA ANALYTICS
CS4491/CS 7265 BIG DATA ANALYTICS EVALUATION * Some contents are adapted from Dr. Hung Huang and Dr. Chengkai Li at UT Arlington Dr. Mingon Kang Computer Science, Kennesaw State University Evaluation for
More informationBuilding Classifiers using Bayesian Networks
Building Classifiers using Bayesian Networks Nir Friedman and Moises Goldszmidt 1997 Presented by Brian Collins and Lukas Seitlinger Paper Summary The Naive Bayes classifier has reasonable performance
More informationPart II: A broader view
Part II: A broader view Understanding ML metrics: isometrics, basic types of linear isometric plots linear metrics and equivalences between them skew-sensitivity non-linear metrics Model manipulation:
More informationDATA MINING LECTURE 9. Classification Basic Concepts Decision Trees Evaluation
DATA MINING LECTURE 9 Classification Basic Concepts Decision Trees Evaluation What is a hipster? Examples of hipster look A hipster is defined by facial hair Hipster or Hippie? Facial hair alone is not
More informationData Warehousing and Machine Learning
Data Warehousing and Machine Learning Introduction Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 47 What is Data Mining?? Introduction DWML Spring
More informationCS145: INTRODUCTION TO DATA MINING
CS145: INTRODUCTION TO DATA MINING 08: Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu October 24, 2017 Learnt Prediction and Classification Methods Vector Data
More informationCLASSIFICATION JELENA JOVANOVIĆ. Web:
CLASSIFICATION JELENA JOVANOVIĆ Email: jeljov@gmail.com Web: http://jelenajovanovic.net OUTLINE What is classification? Binary and multiclass classification Classification algorithms Naïve Bayes (NB) algorithm
More informationMetrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?
Model Evaluation Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates? Methods for Model Comparison How to
More informationNaïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict
More informationEvaluating Classifiers
Evaluating Classifiers Charles Elkan elkan@cs.ucsd.edu January 18, 2011 In a real-world application of supervised learning, we have a training set of examples with labels, and a test set of examples with
More informationCS 229 Midterm Review
CS 229 Midterm Review Course Staff Fall 2018 11/2/2018 Outline Today: SVMs Kernels Tree Ensembles EM Algorithm / Mixture Models [ Focus on building intuition, less so on solving specific problems. Ask
More informationData Mining Classification: Bayesian Decision Theory
Data Mining Classification: Bayesian Decision Theory Lecture Notes for Chapter 2 R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification, 2nd ed. New York: Wiley, 2001. Lecture Notes for Chapter
More informationEvaluation of different biological data and computational classification methods for use in protein interaction prediction.
Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Yanjun Qi, Ziv Bar-Joseph, Judith Klein-Seetharaman Protein 2006 Motivation Correctly
More informationCS 584 Data Mining. Classification 3
CS 584 Data Mining Classification 3 Today Model evaluation & related concepts Additional classifiers Naïve Bayes classifier Support Vector Machine Ensemble methods 2 Model Evaluation Metrics for Performance
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2013/12/09 1 Practice plan 2013/11/11: Predictive data mining 1 Decision trees Evaluating classifiers 1: separate
More informationEvaluating Machine Learning Methods: Part 1
Evaluating Machine Learning Methods: Part 1 CS 760@UW-Madison Goals for the lecture you should understand the following concepts bias of an estimator learning curves stratified sampling cross validation
More informationWeka ( )
Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised
More informationClassification Part 4
Classification Part 4 Dr. Sanjay Ranka Professor Computer and Information Science and Engineering University of Florida, Gainesville Model Evaluation Metrics for Performance Evaluation How to evaluate
More informationCPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016
CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Admin Course add/drop deadline tomorrow. Assignment 1 is due Friday. Setup your CS undergrad account ASAP to use Handin: https://www.cs.ubc.ca/getacct
More informationUse of Synthetic Data in Testing Administrative Records Systems
Use of Synthetic Data in Testing Administrative Records Systems K. Bradley Paxton and Thomas Hager ADI, LLC 200 Canal View Boulevard, Rochester, NY 14623 brad.paxton@adillc.net, tom.hager@adillc.net Executive
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data
More informationDATA MINING LECTURE 9. Classification Decision Trees Evaluation
DATA MINING LECTURE 9 Classification Decision Trees Evaluation 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium
More informationDATA MINING OVERFITTING AND EVALUATION
DATA MINING OVERFITTING AND EVALUATION 1 Overfitting Will cover mechanisms for preventing overfitting in decision trees But some of the mechanisms and concepts will apply to other algorithms 2 Occam s
More informationChapter 3: Supervised Learning
Chapter 3: Supervised Learning Road Map Basic concepts Evaluation of classifiers Classification using association rules Naïve Bayesian classification Naïve Bayes for text classification Summary 2 An example
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationClassification. Slide sources:
Classification Slide sources: Gideon Dror, Academic College of TA Yaffo Nathan Ifill, Leicester MA4102 Data Mining and Neural Networks Andrew Moore, CMU : http://www.cs.cmu.edu/~awm/tutorials 1 Outline
More informationNaïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others
Naïve Bayes Classification Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others Things We d Like to Do Spam Classification Given an email, predict
More informationNetwork Traffic Measurements and Analysis
DEIB - Politecnico di Milano Fall, 2017 Sources Hastie, Tibshirani, Friedman: The Elements of Statistical Learning James, Witten, Hastie, Tibshirani: An Introduction to Statistical Learning Andrew Ng:
More informationCLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS
CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS CHAPTER 4 CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS 4.1 Introduction Optical character recognition is one of
More informationCSE4334/5334 DATA MINING
CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar (modified by Predrag Radivojac, 2017) Classification:
More informationDATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane
DATA MINING AND MACHINE LEARNING Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane Academic Year 2016/2017 Table of contents Data preprocessing Feature normalization Missing
More informationData Mining and Knowledge Discovery: Practice Notes
Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization
More informationTutorials Case studies
1. Subject Three curves for the evaluation of supervised learning methods. Evaluation of classifiers is an important step of the supervised learning process. We want to measure the performance of the classifier.
More informationMachine Learning. Classification
10-701 Machine Learning Classification Inputs Inputs Inputs Where we are Density Estimator Probability Classifier Predict category Today Regressor Predict real no. Later Classification Assume we want to
More informationECE 5470 Classification, Machine Learning, and Neural Network Review
ECE 5470 Classification, Machine Learning, and Neural Network Review Due December 1. Solution set Instructions: These questions are to be answered on this document which should be submitted to blackboard
More informationCISC 4631 Data Mining
CISC 4631 Data Mining Lecture 03: Introduction to classification Linear classifier Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside) 1 Classification:
More informationDATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS
DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes and a class attribute
More informationCREDIT RISK MODELING IN R. Finding the right cut-off: the strategy curve
CREDIT RISK MODELING IN R Finding the right cut-off: the strategy curve Constructing a confusion matrix > predict(log_reg_model, newdata = test_set, type = "response") 1 2 3 4 5 0.08825517 0.3502768 0.28632298
More informationClassification: Basic Concepts, Decision Trees, and Model Evaluation
Classification: Basic Concepts, Decision Trees, and Model Evaluation Data Warehousing and Mining Lecture 4 by Hossen Asiful Mustafa Classification: Definition Given a collection of records (training set
More informationCPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016
CPSC 340: Machine Learning and Data Mining Non-Parametric Models Fall 2016 Assignment 0: Admin 1 late day to hand it in tonight, 2 late days for Wednesday. Assignment 1 is out: Due Friday of next week.
More information2. On classification and related tasks
2. On classification and related tasks In this part of the course we take a concise bird s-eye view of different central tasks and concepts involved in machine learning and classification particularly.
More information.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.
.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Definitions Data. Consider a set A = {A 1,...,A n } of attributes, and an additional
More informationClassification and Regression
Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan
More informationBayesian Networks Inference (continued) Learning
Learning BN tutorial: ftp://ftp.research.microsoft.com/pub/tr/tr-95-06.pdf TAN paper: http://www.cs.huji.ac.il/~nir/abstracts/frgg1.html Bayesian Networks Inference (continued) Learning Machine Learning
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Classification (Basic Concepts) Huan Sun, CSE@The Ohio State University 09/12/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han Classification: Basic Concepts
More informationEvaluation Metrics. (Classifiers) CS229 Section Anand Avati
Evaluation Metrics (Classifiers) CS Section Anand Avati Topics Why? Binary classifiers Metrics Rank view Thresholding Confusion Matrix Point metrics: Accuracy, Precision, Recall / Sensitivity, Specificity,
More informationContents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation
Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation Learning 4 Supervised Learning 4 Unsupervised Learning 4
More informationData Mining D E C I S I O N T R E E. Matteo Golfarelli
Data Mining D E C I S I O N T R E E Matteo Golfarelli Decision Tree It is one of the most widely used classification techniques that allows you to represent a set of classification rules with a tree. Tree:
More informationData Mining and Knowledge Discovery Practice notes: Numeric Prediction, Association Rules
Keywords Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 06/0/ Data Attribute, example, attribute-value data, target variable, class, discretization Algorithms
More informationMachine Learning: Symbolische Ansätze
Machine Learning: Symbolische Ansätze Evaluation and Cost-Sensitive Learning Evaluation Hold-out Estimates Cross-validation Significance Testing Sign test ROC Analysis Cost-Sensitive Evaluation ROC space
More informationPart I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes
Week 4 Based in part on slides from textbook, slides of Susan Holmes Part I Classification & Decision Trees October 19, 2012 1 / 1 2 / 1 Classification Classification Problem description We are given a
More informationINTRODUCTION TO MACHINE LEARNING. Measuring model performance or error
INTRODUCTION TO MACHINE LEARNING Measuring model performance or error Is our model any good? Context of task Accuracy Computation time Interpretability 3 types of tasks Classification Regression Clustering
More informationMachine Learning and Bioinformatics 機器學習與生物資訊學
Molecular Biomedical Informatics 分子生醫資訊實驗室 機器學習與生物資訊學 Machine Learning & Bioinformatics 1 Evaluation The key to success 2 Three datasets of which the answers must be known 3 Note on parameter tuning It
More informationEvaluation. Evaluate what? For really large amounts of data... A: Use a validation set.
Evaluate what? Evaluation Charles Sutton Data Mining and Exploration Spring 2012 Do you want to evaluate a classifier or a learning algorithm? Do you want to predict accuracy or predict which one is better?
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationINTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá
INTRODUCTION TO DATA MINING Daniel Rodríguez, University of Alcalá Outline Knowledge Discovery in Datasets Model Representation Types of models Supervised Unsupervised Evaluation (Acknowledgement: Jesús
More informationNächste Woche. Dienstag, : Vortrag Ian Witten (statt Vorlesung) Donnerstag, 4.12.: Übung (keine Vorlesung) IGD, 10h. 1 J.
1 J. Fürnkranz Nächste Woche Dienstag, 2. 12.: Vortrag Ian Witten (statt Vorlesung) IGD, 10h 4 Donnerstag, 4.12.: Übung (keine Vorlesung) 2 J. Fürnkranz Evaluation and Cost-Sensitive Learning Evaluation
More information10 Classification: Evaluation
CSE4334/5334 Data Mining 10 Classification: Evaluation Chengkai Li Department of Computer Science and Engineering University of Texas at Arlington Fall 2018 (Slides courtesy of Pang-Ning Tan, Michael Steinbach
More informationCS249: ADVANCED DATA MINING
CS249: ADVANCED DATA MINING Classification Evaluation and Practical Issues Instructor: Yizhou Sun yzsun@cs.ucla.edu April 24, 2017 Homework 2 out Announcements Due May 3 rd (11:59pm) Course project proposal
More informationStatistics 202: Statistical Aspects of Data Mining
Statistics 202: Statistical Aspects of Data Mining Professor Rajan Patel Lecture 9 = More of Chapter 5 Agenda: 1) Lecture over more of Chapter 5 1 Introduction to Data Mining by Tan, Steinbach, Kumar Chapter
More informationNonparametric Methods Recap
Nonparametric Methods Recap Aarti Singh Machine Learning 10-701/15-781 Oct 4, 2010 Nonparametric Methods Kernel Density estimate (also Histogram) Weighted frequency Classification - K-NN Classifier Majority
More informationPart I. Instructor: Wei Ding
Classification Part I Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Classification: Definition Given a collection of records (training set ) Each record contains a set
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationMEASURING CLASSIFIER PERFORMANCE
MEASURING CLASSIFIER PERFORMANCE ERROR COUNTING Error types in a two-class problem False positives (type I error): True label is -1, predicted label is +1. False negative (type II error): True label is
More informationLecture Notes for Chapter 4
Classification - Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Look for accompanying R code on the course web
More informationECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning
ECE 6504: Advanced Topics in Machine Learning Probabilistic Graphical Models and Large-Scale Learning Topics Bayes Nets (Finish) Parameter Learning Structure Learning Readings: KF 18.1, 18.3; Barber 9.5,
More informationMachine Learning. Decision Trees. Manfred Huber
Machine Learning Decision Trees Manfred Huber 2015 1 Decision Trees Classifiers covered so far have been Non-parametric (KNN) Probabilistic with independence (Naïve Bayes) Linear in features (Logistic
More informationPattern recognition (4)
Pattern recognition (4) 1 Things we have discussed until now Statistical pattern recognition Building simple classifiers Supervised classification Minimum distance classifier Bayesian classifier (1D and
More informationCSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13
CSE 634 - Data Mining Concepts and Techniques STATISTICAL METHODS Professor- Anita Wasilewska (REGRESSION) Team 13 Contents Linear Regression Logistic Regression Bias and Variance in Regression Model Fit
More informationCCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.
CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems Leigh M. Smith Humtap Inc. leigh@humtap.com Basic system overview Segmentation (Frames, Onsets, Beats, Bars, Chord Changes, etc) Feature
More informationLecture 7: Decision Trees
Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...
More informationCISC 4631 Data Mining
CISC 4631 Data Mining Lecture 05: Overfitting Evaluation: accuracy, precision, recall, ROC Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Eamonn Koegh (UC Riverside)
More informationInterpretation and evaluation
Interpretation and evaluation 1. Descriptive tasks Evaluation based on novelty, interestingness, usefulness and understandability Qualitative evaluation: obvious (common sense) knowledge knowledge that
More informationEvaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support
Evaluating generalization (validation) Harvard-MIT Division of Health Sciences and Technology HST.951J: Medical Decision Support Topics Validation of biomedical models Data-splitting Resampling Cross-validation
More information3 Virtual attribute subsetting
3 Virtual attribute subsetting Portions of this chapter were previously presented at the 19 th Australian Joint Conference on Artificial Intelligence (Horton et al., 2006). Virtual attribute subsetting
More informationExam Advanced Data Mining Date: Time:
Exam Advanced Data Mining Date: 11-11-2010 Time: 13.30-16.30 General Remarks 1. You are allowed to consult 1 A4 sheet with notes written on both sides. 2. Always show how you arrived at the result of your
More informationCOSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor
COSC160: Detection and Classification Jeremy Bolton, PhD Assistant Teaching Professor Outline I. Problem I. Strategies II. Features for training III. Using spatial information? IV. Reducing dimensionality
More informationCS570: Introduction to Data Mining
CS570: Introduction to Data Mining Classification Advanced Reading: Chapter 8.4 & 8.5 Han, Chapters 4.5 & 4.6 Tan Anca Doloc-Mihu, Ph.D. Slides courtesy of Li Xiong, Ph.D., 2011 Han, Kamber & Pei. Data
More informationData Mining Classification - Part 1 -
Data Mining Classification - Part 1 - Universität Mannheim Bizer: Data Mining I FSS2019 (Version: 20.2.2018) Slide 1 Outline 1. What is Classification? 2. K-Nearest-Neighbors 3. Decision Trees 4. Model
More informationPerformance Evaluation of Various Classification Algorithms
Performance Evaluation of Various Classification Algorithms Shafali Deora Amritsar College of Engineering & Technology, Punjab Technical University -----------------------------------------------------------***----------------------------------------------------------
More informationThe exam is closed book, closed notes except your one-page cheat sheet.
CS 189 Fall 2015 Introduction to Machine Learning Final Please do not turn over the page before you are instructed to do so. You have 2 hours and 50 minutes. Please write your initials on the top-right
More informationSUPERVISED LEARNING WITH SCIKIT-LEARN. How good is your model?
SUPERVISED LEARNING WITH SCIKIT-LEARN How good is your model? Classification metrics Measuring model performance with accuracy: Fraction of correctly classified samples Not always a useful metric Class
More informationMachine Problem 8 - Mean Field Inference on Boltzman Machine
CS498: Applied Machine Learning Spring 2018 Machine Problem 8 - Mean Field Inference on Boltzman Machine Professor David A. Forsyth Auto-graded assignment Introduction Mean-Field Approximation is a useful
More information