Data Mining: Model Evaluation - PDF Free Download

Data Mnng: Model Evaluaton Aprl 16, 2013 1

Issues: Evaluatng Classfcaton Methods Accurac classfer accurac: predctng class label predctor accurac: guessng value of predcted attrbutes Speed tme to construct the model (tranng tme) tme to use the model (classfcaton/predcton tme) Robustness: handlng nose and mssng values Scalablt: effcenc n dsk-resdent databases Interpretablt understandng and nsght provded b the model Other measures, e.g., goodness of rules, such as decson tree sze or compactness of classfcaton rules Aprl 16, 2013 2

Predctor Error Measures Measure predctor accurac: measure how far off the predcted value s from the actual known value Loss functon: measures the error betw. and the predcted value Absolute error: Squared error: ( ) 2 Test error (generalzaton error): the average loss over the test set Mean absolute error: Mean squared error: Relatve absolute error: Relatve squared error: The mean squared-error exaggerates the presence of outlers Popularl use (square) root mean-square error, smlarl, root relatve squared error d d = 1 ' d d = 1 2 ') ( = = d d 1 1 ' = = d d 1 2 1 2 ) ( ') ( Aprl 16, 2013 3

Evaluatng the Accurac of a Classfer or Predctor (I) Holdout method Gven data s randoml parttoned nto two ndependent sets Tranng set (e.g., 2/3) for model constructon Test set (e.g., 1/3) for accurac estmaton Random samplng: a varaton of holdout Repeat holdout k tmes, accurac = avg. of the accuraces obtaned Cross-valdaton (k-fold, where k = 10 s most popular) Randoml partton the data nto k mutuall exclusve subsets, each approxmatel equal sze At -th teraton, use D as test set and others as tranng set Leave-one-out: k folds where k = # of tuples, for small szed data Stratfed cross-valdaton: folds are stratfed so that class dst. n each fold s approx. the same as that n the ntal data Aprl 16, 2013 4

Evaluatng the Accurac of a Classfer or Predctor (II) Bootstrap Works well wth small data sets Samples the gven tranng tuples unforml wth replacement.e., each tme a tuple s selected, t s equall lkel to be selected agan and re-added to the tranng set Several boostrap methods, and a common one s.632 boostrap Suppose we are gven a data set of d tuples. The data set s sampled d tmes, wth replacement, resultng n a tranng set of d samples. The data tuples that dd not make t nto the tranng set end up formng the test set. About 63.2% of the orgnal data wll end up n the bootstrap, and the remanng 36.8% wll form the test set (snce (1 1/d) d e -1 = 0.368) Repeat the samplng procedure k tmes, overall accurac of the model: k acc ( M ) = (0.632 acc( M ) test _ set + 0.368 acc( M ) tran _ = 1 Aprl 16, 2013 5 set )

Model Evaluaton Metrcs for Performance Evaluaton How to evaluate the performance of a model? Methods for Performance Evaluaton How to obtan relable estmates? Methods for Model Comparson How to compare the relatve performance among competng models? Aprl 16, 2013 6

Metrcs for Performance Evaluaton Focus on the predctve capablt of a model Rather than how fast t takes to classf or buld models, scalablt, etc. Confuson Matrx: ACTUAL CLASS PREDICTED CLASS Class=Yes Class=No Class=Yes a (TP) b (FN) Class=No c (FP) d (TN) a: TP (true postve) b: FN (false negatve) c: FP (false postve) d: TN (true Aprl 16, 2013 negatve) 7

Metrcs for Performance Evaluaton PREDICTED CLASS Class=Yes Class=No ACTUAL CLASS Class=Yes Class=No Most wdel-used metrc: a (TP) c (FP) b (FN) d (TN) Accurac = a a + b + + d c + d = TP TP + TN + TN + FP + FN Aprl 16, 2013 8

Classfer Accurac Measures Predcted classes bu_computer = es bu_computer = no total recognton(%) bu_computer = es 6954 46 7000 99.34 bu_computer = no 412 2588 3000 86.27 total 7366 2634 10000 95.52 Accurac of a classfer M, acc(m): percentage of test set tuples that are correctl classfed b the model M Error rate (msclassfcaton rate) of M = 1 acc(m) Gven m classes, CM,j, an entr n a confuson matrx, ndcates # of tuples n class that are labeled b the classfer as class j Alternatve accurac measures (e.g., for cancer dagnoss) senstvt = TP/TP+FN /* true postve recognton rate */ specfct = TN/TN+FP /* true negatve recognton rate */ Ths model can also be used for cost-beneft analss Aprl 16, 2013 9

Lmtaton of Accurac Consder a 2-class problem Number of Class 0 examples = 9990 Number of Class 1 examples = 10 If model predcts everthng to be class 0, accurac s 9990/10000 = 99.9 % Accurac s msleadng because model does not detect an class 1 example Aprl 16, 2013 10

Cost Matrx PREDICTED CLASS C( j) Class=Yes Class=No ACTUAL CLASS Class=Yes C(Yes Yes) C(No Yes) Class=No C(Yes No) C(No No) C( j): Cost of msclassfng class j example as class Aprl 16, 2013 11

Computng Cost of Classfcaton Cost Matrx ACTUAL CLASS PREDICTED CLASS C( j) + - + -1 100-1 0 Model M 1 PREDICTED CLASS Model M 2 PREDICTED CLASS ACTUAL CLASS + - + 150 40-60 250 ACTUAL CLASS + - + 250 45-5 200 Accurac = 80% Cost = 3910 Accurac = 90% Cost = 4255 Aprl 16, 2013 12

Cost vs Accurac Count ACTUAL CLASS PREDICTED CLASS Class=Yes Class=No Class=Yes a b Class=No c d Accurac s proportonal to cost f 1. C(Yes No)=C(No Yes) = q 2. C(Yes Yes)=C(No No) = p N = a + b + c + d Accurac = (a + d)/n Cost ACTUAL CLASS PREDICTED CLASS Class=Yes Class=No Class=Yes p q Class=No q p Cost = p (a + d) + q (b + c) = p (a + d) + q (N a d) = q N (q p)(a + d) = N [q (q-p) Accurac] Aprl 16, 2013 13

Cost-Senstve Measures a Precson (p) = a + c a Recall (r) = a + b 2rp 2a F - measure (F) = = r + p 2a + b + c Precson s based towards C(Yes Yes) & C(Yes No) Recall s based towards C(Yes Yes) & C(No Yes) F-measure s based towards all except C(No No) Weghted Accurac = w a 1 w a + 1 + w b + 2 w d 4 w c + 3 w d Aprl 16, 2013 14 4

Methods for Performance Evaluaton How to obtan a relable estmate of performance? Performance of a model ma depend on other factors besdes the learnng algorthm: Class dstrbuton Cost of msclassfcaton Sze of tranng and test sets Aprl 16, 2013 16

Learnng Curve Learnng curve shows how accurac changes wth varng sample sze Requres a samplng schedule for creatng learnng curve: Arthmetc samplng (Langle, et al) Geometrc samplng (Provost et al) Effect of small sample sze: - Bas n the estmate - Varance of estmate Aprl 16, 2013 17

Holdout Methods of Estmaton Reserve 2/3 for tranng and 1/3 for testng Random subsamplng Repeated holdout Cross valdaton Partton data nto k dsjont subsets k-fold: tran on k-1 parttons, test on the remanng one Leave-one-out: k=n Stratfed samplng oversamplng vs undersamplng Bootstrap Samplng wth replacement Aprl 16, 2013 18

ROC (Recever Operatng Characterstc) Developed n 1950s for sgnal detecton theor to analze nos sgnals Characterze the trade-off between postve hts and false alarms ROC curve plots TP (on the -axs) aganst FP (on the x-axs) Performance of each classfer represented as a pont on the ROC curve changng the threshold of algorthm, sample dstrbuton or cost matrx changes the locaton of the pont Aprl 16, 2013 20

ROC Curve - 1-dmensonal data set contanng 2 classes (postve and negatve) - an ponts located at x > t s classfed as postve At threshold t: TP=0.5, FN=0.5, FP=0.12, FN=0.88 Aprl 16, 2013 21

ROC Curve (TP,FP): (0,0): declare everthng to be negatve class (1,1): declare everthng to be postve class (1,0): deal Dagonal lne: Random guessng Below dagonal lne: predcton s opposte of the true class Aprl 16, 2013 22

Usng ROC for Model Comparson In general, No model consstentl outperform the other M 1 s better for small FPR M 2 s better for large FPR Aprl 16, 2013 23