Pattern recognition (4)

Similar documents
Weka ( )

Evaluating Machine-Learning Methods. Goals for the lecture

INF 4300 Classification III Anne Solberg The agenda today:

CS145: INTRODUCTION TO DATA MINING

MEASURING CLASSIFIER PERFORMANCE

CS249: ADVANCED DATA MINING

Evaluating Machine Learning Methods: Part 1

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Classification Part 4

CCRMA MIR Workshop 2014 Evaluating Information Retrieval Systems. Leigh M. Smith Humtap Inc.

Data Mining Classification: Bayesian Decision Theory

Metrics for Performance Evaluation How to evaluate the performance of a model? Methods for Performance Evaluation How to obtain reliable estimates?

Data Mining Classification: Alternative Techniques. Imbalanced Class Problem

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

Machine Learning for. Artem Lind & Aleskandr Tkachenko

CISC 4631 Data Mining

Evaluating Classifiers

List of Exercises: Data Mining 1 December 12th, 2015

Evaluating Classifiers

Evaluation Metrics. (Classifiers) CS229 Section Anand Avati

DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS

Information Management course

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Classification. Instructor: Wei Ding

ECE 5470 Classification, Machine Learning, and Neural Network Review

Data Mining and Knowledge Discovery Practice notes 2

CS 584 Data Mining. Classification 3

Data Mining and Knowledge Discovery: Practice Notes

CS145: INTRODUCTION TO DATA MINING

Artificial Intelligence. Programming Styles

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

Data Mining and Knowledge Discovery: Practice Notes

Flat Clustering. Slides are mostly from Hinrich Schütze. March 27, 2017

CISC 4631 Data Mining

Classification and Regression

CLASSIFICATION WITH RADIAL BASIS AND PROBABILISTIC NEURAL NETWORKS

CSE 158. Web Mining and Recommender Systems. Midterm recap

CS4491/CS 7265 BIG DATA ANALYTICS

For continuous responses: the Actual by Predicted plot how well the model fits the models. For a perfect fit, all the points would be on the diagonal.

Improving Positron Emission Tomography Imaging with Machine Learning David Fan-Chung Hsu CS 229 Fall

Evaluation. Evaluate what? For really large amounts of data... A: Use a validation set.

Boolean Classification

CSE Data Mining Concepts and Techniques STATISTICAL METHODS (REGRESSION) Professor- Anita Wasilewska. Team 13

Measuring Intrusion Detection Capability: An Information- Theoretic Approach

CPSC 340: Machine Learning and Data Mining. Non-Parametric Models Fall 2016

DATA MINING OVERFITTING AND EVALUATION

Machine Learning nearest neighbors classification. Luigi Cerulo Department of Science and Technology University of Sannio

INTRODUCTION TO DATA MINING. Daniel Rodríguez, University of Alcalá

Tutorials Case studies

Introduction to Information Retrieval

Use of Synthetic Data in Testing Administrative Records Systems

EVALUATIONS OF THE EFFECTIVENESS OF ANOMALY BASED INTRUSION DETECTION SYSTEMS BASED ON AN ADAPTIVE KNN ALGORITHM

CHAPTER 6 DETECTION OF MASS USING NOVEL SEGMENTATION, GLCM AND NEURAL NETWORKS

Network Traffic Measurements and Analysis

k-nn Disgnosing Breast Cancer

Probabilistic Classifiers DWML, /27

Naïve Bayes Classification. Material borrowed from Jonathan Huang and I. H. Witten s and E. Frank s Data Mining and Jeremy Wyatt and others

Evaluating Classifiers

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Contents Machine Learning concepts 4 Learning Algorithm 4 Predictive Model (Model) 4 Model, Classification 4 Model, Regression 4 Representation

CSE 7/5337: Information Retrieval and Web Search Document clustering I (IIR 16)

Distress Image Library for Precision and Bias of Fully Automated Pavement Cracking Survey

Deep Learning for Computer Vision

2. On classification and related tasks

COMS 4771 Clustering. Nakul Verma

Feature Subset Selection using Clusters & Informed Search. Team 3

Intro to Artificial Intelligence

Classification. Vladimir Curic. Centre for Image Analysis Swedish University of Agricultural Sciences Uppsala University

Cross- Valida+on & ROC curve. Anna Helena Reali Costa PCS 5024

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

DATA MINING AND MACHINE LEARNING. Lecture 6: Data preprocessing and model selection Lecturer: Simone Scardapane

PV211: Introduction to Information Retrieval

Medical images, segmentation and analysis

Bioimage Informatics

ML4Bio Lecture #1: Introduc3on. February 24 th, 2016 Quaid Morris

INF4820, Algorithms for AI and NLP: Evaluating Classifiers Clustering

Binary Diagnostic Tests Clustered Samples

Large Scale Data Analysis Using Deep Learning

Traditional clustering fails if:

Classification Algorithms in Data Mining

Data Mining: Classifier Evaluation. CSCI-B490 Seminar in Computer Science (Data Mining)

ECLT 5810 Evaluation of Classification Quality

Introduction to Information Retrieval

Classification of Subject Motion for Improved Reconstruction of Dynamic Magnetic Resonance Imaging

Introduction to Machine Learning

INF4820 Algorithms for AI and NLP. Evaluating Classifiers Clustering

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Introduction to Automated Text Analysis. bit.ly/poir599

Using Real-valued Meta Classifiers to Integrate and Contextualize Binding Site Predictions

DT-Binarize: A Hybrid Binarization Method using Decision Tree for Protein Crystallization Images

Data Mining Concepts & Techniques

NETWORK FAULT DETECTION - A CASE FOR DATA MINING

Hybrid Approach for MRI Human Head Scans Classification using HTT based SFTA Texture Feature Extraction Technique

Global Journal of Engineering Science and Research Management

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

CREDIT RISK MODELING IN R. Finding the right cut-off: the strategy curve

Chapter 3: Supervised Learning

Climbing the Kaggle Leaderboard by Exploiting the Log-Loss Oracle

AI32 Guide to Weka. Andrew Roberts 1st March 2005

Transcription:

Pattern recognition (4) 1

Things we have discussed until now Statistical pattern recognition Building simple classifiers Supervised classification Minimum distance classifier Bayesian classifier (1D and multiple D) Building discriminant functions Unsupervised classification K-means algorithm 2

Equivalence between classifiers Pattern recognition using multivariate normal distributions and equal priors is simply a minimum Mahalonobis distance classifier. 3

Today Classifier design: Errors and risk in the classification process Performance evaluation of classification systems Reading: slides and blackboard derivations only 4

Error How often will we be wrong? in the two class-case: 5

Global error Suppose that using our training samples, we have partitioned the feature space into regions R i corresponds to class ω i if all samples belonging to this region of the feature space will be classified in class ω i We can compute then the overall error of the classification process by integrating the class-specific errors over their corresponding regions 6

Risk Not every mistake has the same cost Classification strategy: minimizing the probability of loss (risk) instead of minimizing the probability of error 7

Example Computer-assisted diagnosis of suspicious lesions in a CT scan: Two ways to go wrong Alpha risk Beta risk Label a lesion as malignant when in fact it is benign Unnecessary biopsy Label a lesion as benign when in fact it is malignant Progression of cancer 8

Another example Quality control for manufactured parts: accept or reject a part Two ways to go wrong Alpha risk Accept a bad part Loses customers Beta risk Reject a good part Wastes money 9

Loss tables Suppose that the cost of classifying into class ω j when the actual class is ω i is L ij We can summarize this in a loss table 10

Example (cont d) Let ω 1 be the class for good parts and ω 2 be the class for bad parts 11

Risk for a specific pattern X Loss tables are useful for computing the risk in making a specific choice α i given pattern x: R( α / X i ) We can compute this risk by adding up the costs for each possible classification. 12

Computing risks for our example Suppose that for our earlier example we have computed the posterior probabilities P ( ω 2 / X ) = 0.8; P( ω / X ) = 1 0.2 We will compute the risks for classifying X in ω 1 and in ω 2 respectively. 13

Bayesian classifiers revisited Instead of maximizing posterior probability P(ω j /X), minimize risk R(α j /X) Given N classes, we compute N risk values r j =R(α j /X) We assign X to the class corresponding to the minimum risk Derivation.. 14

The 0-1 Loss rule Under this rule, the Bayesian classifier maximizes the posterior probability (as we have learned in the previous lecture) and can be expressed as a minimum distance classifier. The most common assumption in Computer Vision classifiers! 15

Performance classification paradigms Against ground truth (manually generated segmentation/classification) The method of preference in medical image segmentation Benchmarking: for mature/maturing subfields in computer vision Example 1: The gait identification challenge problem: datasets and baseline algorithm, in International Conference on Pattern Recognition 2002 Example 2: Benchmark Studies on Face Recognition, in International Workshop on Automatic Face- and Gesture- Recognition 1995. 16

Evaluation of classifiers ROC analysis Precision and recall Confusion matrices 17

ROC analysis ROC stands for receiver-operator characteristic and was initially used to analyze and compare the performances of human radar operators. A ROC curve=plot of false positive rate against true positive rate as some parameter is varied. 1970: ROC curves were used in medical studies; useful in bringing out the sensitivity (true positive rate) versus specificity (false positive rate) of diagnosis trials. Computer Vision performs ROC analysis for algorithms We can also compare different algorithms that are designed for the same task 18

ROC terminology Four kinds of errors: TP yes and are right (True Positives) hit TN no and are right (True Negatives) correct rejection FP yes and are wrong (False Positives) false alarm FN no and are wrong (False Negatives) miss We don t actually really need all four rates because FN = 1-TP TN = 1-FP 19

False positives, false negatives 20

ROC curves trade-off between the true positive rate and the false positive rate: an increase in true positive rate is accompanied by an increase in false positive rate the area under each curve gives a measure of accuracy 21

ROC curve - the closer the curve approaches the top left-hand corner of the plot, the more accurate the classifier; - the closer the curve is to a 45 diagonal, the worse the classifier; 22

Where are ROC curves helpful? Detection-type problems Face detection in images/video data Event detection in video data Lesion detection in medical images Etc 23

Precision and recall Also used mostly for detection-type problems In a multiple class case, can be measured for each class no of correct detections true C1 precision = = Total number of detections true C1+ false alarms no of correct detections true C1 recall = = total number of C1samples in database true C1+ missed detections 24

Trade-of between precision and recall Example: content-based image retrieval Suppose we aim at detecting all sunset images from an image database The image database contains 200 sunset images The classifier retrieves 150 of the relevant 200 images and 100 images of no interest to the user Precision=150/250=60% Recall=150/200=57% The system could obtain 100 percent recall if returned all images in the database, but its precision would be terrible If we aim at a low false alarm rate: precision would be high, recall would be low. 25

Confusion matrix Used for visualizing/reporting results of a classification system 26

The binary confusion matrix We can construct a binary confusion matrix for one class 27

Calculating the precision and recall from the confusion matrix Example. Consider the confusion matrix of a OCR that produces the following output over a test document set Calculate the precision and recall for class a. 28