Pattern recognition (4)

Size: px

Start display at page:

Download "Pattern recognition (4)"

Jacob Alexander
6 years ago
Views:

1 Pattern recognition (4) 1

2 Things we have discussed until now Statistical pattern recognition Building simple classifiers Supervised classification Minimum distance classifier Bayesian classifier (1D and multiple D) Building discriminant functions Unsupervised classification K-means algorithm 2

3 Equivalence between classifiers Pattern recognition using multivariate normal distributions and equal priors is simply a minimum Mahalonobis distance classifier. 3

4 Today Classifier design: Errors and risk in the classification process Performance evaluation of classification systems Reading: slides and blackboard derivations only 4

5 Error How often will we be wrong? in the two class-case: 5

6 Global error Suppose that using our training samples, we have partitioned the feature space into regions R i corresponds to class ω i if all samples belonging to this region of the feature space will be classified in class ω i We can compute then the overall error of the classification process by integrating the class-specific errors over their corresponding regions 6

7 Risk Not every mistake has the same cost Classification strategy: minimizing the probability of loss (risk) instead of minimizing the probability of error 7

8 Example Computer-assisted diagnosis of suspicious lesions in a CT scan: Two ways to go wrong Alpha risk Beta risk Label a lesion as malignant when in fact it is benign Unnecessary biopsy Label a lesion as benign when in fact it is malignant Progression of cancer 8

9 Another example Quality control for manufactured parts: accept or reject a part Two ways to go wrong Alpha risk Accept a bad part Loses customers Beta risk Reject a good part Wastes money 9

10 Loss tables Suppose that the cost of classifying into class ω j when the actual class is ω i is L ij We can summarize this in a loss table 10

11 Example (cont d) Let ω 1 be the class for good parts and ω 2 be the class for bad parts 11

12 Risk for a specific pattern X Loss tables are useful for computing the risk in making a specific choice α i given pattern x: R( α / X i ) We can compute this risk by adding up the costs for each possible classification. 12

13 Computing risks for our example Suppose that for our earlier example we have computed the posterior probabilities P ( ω 2 / X ) = 0.8; P( ω / X ) = We will compute the risks for classifying X in ω 1 and in ω 2 respectively. 13

14 Bayesian classifiers revisited Instead of maximizing posterior probability P(ω j /X), minimize risk R(α j /X) Given N classes, we compute N risk values r j =R(α j /X) We assign X to the class corresponding to the minimum risk Derivation.. 14

15 The 0-1 Loss rule Under this rule, the Bayesian classifier maximizes the posterior probability (as we have learned in the previous lecture) and can be expressed as a minimum distance classifier. The most common assumption in Computer Vision classifiers! 15

16 Performance classification paradigms Against ground truth (manually generated segmentation/classification) The method of preference in medical image segmentation Benchmarking: for mature/maturing subfields in computer vision Example 1: The gait identification challenge problem: datasets and baseline algorithm, in International Conference on Pattern Recognition 2002 Example 2: Benchmark Studies on Face Recognition, in International Workshop on Automatic Face- and Gesture- Recognition

17 Evaluation of classifiers ROC analysis Precision and recall Confusion matrices 17

18 ROC analysis ROC stands for receiver-operator characteristic and was initially used to analyze and compare the performances of human radar operators. A ROC curve=plot of false positive rate against true positive rate as some parameter is varied. 1970: ROC curves were used in medical studies; useful in bringing out the sensitivity (true positive rate) versus specificity (false positive rate) of diagnosis trials. Computer Vision performs ROC analysis for algorithms We can also compare different algorithms that are designed for the same task 18

19 ROC terminology Four kinds of errors: TP yes and are right (True Positives) hit TN no and are right (True Negatives) correct rejection FP yes and are wrong (False Positives) false alarm FN no and are wrong (False Negatives) miss We don t actually really need all four rates because FN = 1-TP TN = 1-FP 19

20 False positives, false negatives 20

21 ROC curves trade-off between the true positive rate and the false positive rate: an increase in true positive rate is accompanied by an increase in false positive rate the area under each curve gives a measure of accuracy 21

22 ROC curve - the closer the curve approaches the top left-hand corner of the plot, the more accurate the classifier; - the closer the curve is to a 45 diagonal, the worse the classifier; 22

23 Where are ROC curves helpful? Detection-type problems Face detection in images/video data Event detection in video data Lesion detection in medical images Etc 23

24 Precision and recall Also used mostly for detection-type problems In a multiple class case, can be measured for each class no of correct detections true C1 precision = = Total number of detections true C1+ false alarms no of correct detections true C1 recall = = total number of C1samples in database true C1+ missed detections 24

25 Trade-of between precision and recall Example: content-based image retrieval Suppose we aim at detecting all sunset images from an image database The image database contains 200 sunset images The classifier retrieves 150 of the relevant 200 images and 100 images of no interest to the user Precision=150/250=60% Recall=150/200=57% The system could obtain 100 percent recall if returned all images in the database, but its precision would be terrible If we aim at a low false alarm rate: precision would be high, recall would be low. 25

26 Confusion matrix Used for visualizing/reporting results of a classification system 26

27 The binary confusion matrix We can construct a binary confusion matrix for one class 27

28 Calculating the precision and recall from the confusion matrix Example. Consider the confusion matrix of a OCR that produces the following output over a test document set Calculate the precision and recall for class a. 28

Weka ( )

Weka ( ) Weka ( http://www.cs.waikato.ac.nz/ml/weka/ ) The phases in which classifier s design can be divided are reflected in WEKA s Explorer structure: Data pre-processing (filtering) and representation Supervised