Slides for Data Mining by I. H. Witten and E. Frank

Slides for Dt Mining y I. H. Witten nd E. Frnk

Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully & independently A weighted liner comintion might do Instnce-sed: use few prototypes Use simple logicl rules Success of method depends on the domin 3

Inferring rudimentry rules 1R: lerns 1-level decision tree I.e., rules tht ll test one prticulr ttriute Bsic version One rnch for ech vlue Ech rnch ssigns most frequent clss Error rte: proportion of instnces tht don t elong to the mjority clss of their corresponding rnch Choose ttriute with lowest error rte (ssumes nominl ttriutes) 4

Pseudo-code for 1R For ech ttriute, For ech vlue of the ttriute, mke rule s follows: count how often ech clss ppers find the most frequent clss mke the rule ssign tht clss to this ttriute-vlue Clculte the error rte of the rules Choose the rules with the smllest error rte Note: missing is treted s seprte ttriute vlue 5

Evluting the wether ttriutes Outlook Temp Humidity Wind y Ply Attriute Rules Errors Totl errors Sunny Hot High Flse No Outlook Sunny No 2/5 4/14 Sunny Hot High True No Overcst 0/4 Overcst Hot High Flse Riny 2/5 Riny Mild High Flse Temp Hot No* 2/4 5/14 Riny Cool Flse Mild 2/6 Riny Cool True No Cool 1/4 Overcst Cool True Humidity High No 3/7 4/14 Sunny Mild High Flse No 1/7 Sunny Cool Flse Windy Flse 2/8 5/14 Riny Mild Flse True No* 3/6 Sunny Mild True Overcst Overcst Riny Mild Hot Mild High High True Flse True No * indictes tie 6

Deling with numeric ttriutes Discretize numeric ttriutes Divide ech ttriute s rnge into intervls Sort instnces ccording to ttriute s vlues Plce rekpoints where the clss chnges (the mjority clss) Outlook Temperture This minimizes the totl error Sunny Exmple: Sunny temperture 80 from 90 wether True dt Overcst Riny 85 83 75 Humidity 64 65 68 69 70 71 72 72 75 75 80 81 83 85 No No No No No 85 86 80 Windy Flse Flse Flse Ply No No 7

The prolem of overfitting This procedure is very sensitive to noise One instnce with n incorrect clss lel will proly produce seprte intervl Also: time stmp ttriute will hve zero errors Simple solution: enforce minimum numer of instnces in mjority clss per intervl Exmple (with min = 3): 64 65 68 69 70 71 72 72 75 75 80 81 83 85 No No No No No 64 65 68 69 70 71 72 72 75 75 80 81 83 85 No No No No No8

With overfitting voidnce Resulting rule set: Attriute Rules Errors Totl errors Outlook Sunny No 2/5 4/14 Overcst 0/4 Riny 2/5 Temperture 77.5 3/10 5/14 > 77.5 No* 2/4 Humidity 82.5 1/7 3/14 > 82.5 nd 95.5 No 2/6 > 95.5 0/1 Windy Flse 2/8 5/14 True No* 3/6 9

Discussion of 1R 1R ws descried in pper y Holte (1993) Contins n experimentl evlution on 16 dtsets (using cross-vlidtion so tht results were representtive of performnce on future dt) Minimum numer of instnces ws set to 6 fter some experimenttion 1R s simple rules performed not much worse thn much more complex decision trees Simplicity first pys off! Very Simple Clssifiction Rules Perform Well on Most Commonly Used Dtsets Roert C. Holte, Computer Science Deprtment, University of Ottw 10

Covering lgorithms Convert decision tree into rule set Strightforwrd, ut rule set overly complex More effective conversions re not trivil Insted, cn generte rule set directly for ech clss in turn find rule set tht covers ll instnces in it (excluding instnces not in the clss) Clled covering pproch: t ech stge rule is identified tht covers some of the instnces 44

Exmple: generting rule y x y 1 2 x y 2 6 1 2 x If true then clss = If x > 1.2 then clss = Possile rule set for clss : If x 1.2 then clss = If x > 1.2 nd y > 2.6 then clss = If x > 1.2 nd y 2.6 then clss = Could dd more rules, get perfect rule set 45

Rules vs. trees Corresponding decision tree: (produces exctly the sme predictions) But: rule sets cn e more perspicuous when decision trees suffer from replicted sutrees Also: in multiclss situtions, covering lgorithm concentrtes on one clss t time wheres decision tree lerner tkes ll clsses into ccount 46

Simple covering lgorithm Genertes rule y dding tests tht mximize rule s ccurcy Similr to sitution in decision trees: prolem of selecting n ttriute to split on But: decision tree inducer mximizes overll purity Ech new test reduces rule s coverge: spce of exmples rule so fr rule fter dding new term 47

Selecting test Gol: mximize ccurcy t totl numer of instnces covered y rule p positive exmples of the clss covered y rule t p numer of errors mde y rule Select test tht mximizes the rtio p/t We re finished when p/t = 1 or the set of instnces cn t e split ny further 48

Exmple: contct lens dt Rule we seek: Possile tests: If? then recommendtion = hrd Age = Young Age = Pre-presyopic Age = Presyopic Spectcle prescription = Myope Spectcle prescription = Hypermetrope Astigmtism = no Astigmtism = yes Ter production rte = Reduced Ter production rte = 2/8 1/8 1/8 3/12 1/12 0/12 4/12 0/12 4/12 49

Modified rule nd resulting dt Rule with est test dded: If stigmtism = yes then recommendtion = hrd Instnces covered y modified rule: Age Spectcle prescription Astigmtism Ter production rte Recommended lenses Young Myope Reduced None Young Myope Hrd Young Hypermetrope Reduced None Young Hypermetrope hrd Pre-presyopic Myope Reduced None Pre-presyopic Myope Hrd Pre-presyopic Hypermetrope Reduced None Pre-presyopic Hypermetrope None Presyopic Myope Reduced None Presyopic Myope Hrd Presyopic Hypermetrope Reduced None Presyopic Hypermetrope None 50

Further refinement Current stte: If stigmtism = yes nd? then recommendtion = hrd Possile tests: Age = Young Age = Pre-presyopic Age = Presyopic Spectcle prescription = Myope Spectcle prescription = Hypermetrope Ter production rte = Reduced Ter production rte = 2/4 1/4 1/4 3/6 1/6 0/6 4/6 51

Modified rule nd resulting dt Rule with est test dded: If stigmtism = yes nd ter production rte = norml then recommendtion = hrd Instnces covered y modified rule: Age Young Young Pre-presyopic Pre-presyopic Presyopic Presyopic Spectcle prescription Myope Hypermetrope Myope Hypermetrope Myope Hypermetrope Astigmtism Ter production rte Recommended lenses Hrd hrd Hrd None Hrd None 52

Further refinement Current stte: If stigmtism = yes nd ter production rte = norml nd? then recommendtion = hrd Possile tests: Age = Young Age = Pre-presyopic Age = Presyopic Spectcle prescription = Myope Spectcle prescription = Hypermetrope 2/2 1/2 1/2 3/3 1/3 Tie etween the first nd the fourth test We choose the one with greter coverge 53

The result Finl rule: If stigmtism = yes nd ter production rte = norml nd spectcle prescription = myope then recommendtion = hrd Second rule for recommending hrd lenses : (uilt from instnces not covered y first rule) If ge = young nd stigmtism = yes nd ter production rte = norml then recommendtion = hrd These two rules cover ll hrd lenses : Process is repeted with other two clsses 54

Pseudo-code for PRISM For ech clss C Initilize E to the instnce set While E contins instnces in clss C Crete rule R with n empty left-hnd side tht predicts clss C Until R is perfect (or there re no more ttriutes to use) do For ech ttriute A not mentioned in R, nd ech vlue v, Consider dding the condition A = v to the left-hnd side of R Select A nd v to mximize the ccurcy p/t (rek ties y choosing the condition with the lrgest p) Add A = v to R Remove the instnces covered y R from E 55

Rules vs. decision lists PRISM with outer loop removed genertes decision list for one clss Susequent rules re designed for rules tht re not covered y previous rules But: order doesn t mtter ecuse ll rules predict the sme clss Outer loop considers ll clsses seprtely No order dependence implied Prolems: overlpping rules, defult rule required 56

Seprte nd conquer Methods like PRISM (for deling with one clss) re seprte-nd-conquer lgorithms: First, identify useful rule Then, seprte out ll the instnces it covers Finlly, conquer the remining instnces Difference to divide-nd-conquer methods: Suset covered y rule doesn t need to e explored ny further 57