Slides for Data Mining by I. H. Witten and E. Frank

Similar documents
Data Mining Algorithms: Basic Methods

2 Computing all Intersections of a Set of Segments Line Segment Intersection

COMP 423 lecture 11 Jan. 28, 2008

Presentation Martin Randers

Lecture 10 Evolutionary Computation: Evolution strategies and genetic programming

In the last lecture, we discussed how valid tokens may be specified by regular expressions.

CS321 Languages and Compiler Design I. Winter 2012 Lecture 5

Fig.25: the Role of LEX

Lexical Analysis. Amitabha Sanyal. ( as) Department of Computer Science and Engineering, Indian Institute of Technology, Bombay

P(r)dr = probability of generating a random number in the interval dr near r. For this probability idea to make sense we must have

this grammar generates the following language: Because this symbol will also be used in a later step, it receives the

CS143 Handout 07 Summer 2011 June 24 th, 2011 Written Set 1: Lexical Analysis

Reducing a DFA to a Minimal DFA

CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

Implementing Automata. CSc 453. Compilers and Systems Software. 4 : Lexical Analysis II. Department of Computer Science University of Arizona

A Tautology Checker loosely related to Stålmarck s Algorithm by Martin Richards

Engineer To Engineer Note

1. SEQUENCES INVOLVING EXPONENTIAL GROWTH (GEOMETRIC SEQUENCES)

Unit #9 : Definite Integral Properties, Fundamental Theorem of Calculus

If you are at the university, either physically or via the VPN, you can download the chapters of this book as PDFs.

CS201 Discussion 10 DRAWTREE + TRIES

CS481: Bioinformatics Algorithms

What are suffix trees?

Agilent Mass Hunter Software

TO REGULAR EXPRESSIONS

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence Winter 2016

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών

EECS150 - Digital Design Lecture 23 - High-level Design and Optimization 3, Parallelism and Pipelining

ZZ - Advanced Math Review 2017

COMBINATORIAL PATTERN MATCHING

CS311H: Discrete Mathematics. Graph Theory IV. A Non-planar Graph. Regions of a Planar Graph. Euler s Formula. Instructor: Işıl Dillig

1.1. Interval Notation and Set Notation Essential Question When is it convenient to use set-builder notation to represent a set of numbers?

The Greedy Method. The Greedy Method

Dr. D.M. Akbar Hussain

Lily Yen and Mogens Hansen

MATH 25 CLASS 5 NOTES, SEP

Before We Begin. Introduction to Spatial Domain Filtering. Introduction to Digital Image Processing. Overview (1): Administrative Details (1):

CS 430 Spring Mike Lam, Professor. Parsing

Solving Problems by Searching. CS 486/686: Introduction to Artificial Intelligence

George Boole. IT 3123 Hardware and Software Concepts. Switching Algebra. Boolean Functions. Boolean Functions. Truth Tables

Definition of Regular Expression

CSCE 531, Spring 2017, Midterm Exam Answer Key

Suffix Tries. Slides adapted from the course by Ben Langmead

Systems I. Logic Design I. Topics Digital logic Logic gates Simple combinational logic circuits

II. THE ALGORITHM. A. Depth Map Processing

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers

Distributed Systems Principles and Paradigms

CS 241 Week 4 Tutorial Solutions

Greedy Algorithm. Algorithm Fall Semester

6.3 Volumes. Just as area is always positive, so is volume and our attitudes towards finding it.

Ma/CS 6b Class 1: Graph Recap

1 Quad-Edge Construction Operators

Languages. L((a (b)(c))*) = { ε,a,bc,aa,abc,bca,... } εw = wε = w. εabba = abbaε = abba. (a (b)(c)) *

Announcements. CS 188: Artificial Intelligence Fall Recap: Search. Today. Example: Pancake Problem. Example: Pancake Problem

ΕΠΛ323 - Θεωρία και Πρακτική Μεταγλωττιστών. Lecture 3b Lexical Analysis Elias Athanasopoulos

Outline. Introduction Suffix Trees (ST) Building STs in linear time: Ukkonen s algorithm Applications of ST

1.5 Extrema and the Mean Value Theorem

12-B FRACTIONS AND DECIMALS

Improper Integrals. October 4, 2017

UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFORMATICS 1 COMPUTATION & LOGIC INSTRUCTIONS TO CANDIDATES

From Dependencies to Evaluation Strategies

A Comparison of the Discretization Approach for CST and Discretization Approach for VDM

Lecture 7: Integration Techniques

Intermediate Information Structures

CS 432 Fall Mike Lam, Professor a (bc)* Regular Expressions and Finite Automata

Data Mining Part 5. Prediction

Misrepresentation of Preferences

An Algorithm for Enumerating All Maximal Tree Patterns Without Duplication Using Succinct Data Structure

arxiv: v1 [cs.db] 16 Sep 2016

Product of polynomials. Introduction to Programming (in C++) Numerical algorithms. Product of polynomials. Product of polynomials

USING HOUGH TRANSFORM IN LINE EXTRACTION

Geometric transformations

Today. CS 188: Artificial Intelligence Fall Recap: Search. Example: Pancake Problem. Example: Pancake Problem. General Tree Search.

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

Fall 2018 Midterm 1 October 11, ˆ You may not ask questions about the exam except for language clarifications.

Information Retrieval and Organisation

CIS 1068 Program Design and Abstraction Spring2015 Midterm Exam 1. Name SOLUTION

Graphs with at most two trees in a forest building process

What do all those bits mean now? Number Systems and Arithmetic. Introduction to Binary Numbers. Questions About Numbers

Ma/CS 6b Class 1: Graph Recap

Section 3.1: Sequences and Series

10.5 Graphing Quadratic Functions

INTRODUCTION TO SIMPLICIAL COMPLEXES

Inference of node replacement graph grammars

x )Scales are the reciprocal of each other. e

Representation of Numbers. Number Representation. Representation of Numbers. 32-bit Unsigned Integers 3/24/2014. Fixed point Integer Representation

MATH 2530: WORKSHEET 7. x 2 y dz dy dx =

Topic 2: Lexing and Flexing

Thirty-fourth Annual Columbus State Invitational Mathematics Tournament. Instructions

Questions About Numbers. Number Systems and Arithmetic. Introduction to Binary Numbers. Negative Numbers?

Bleach: A Distributed Stream Data Cleaning System

Simplifying Algebra. Simplifying Algebra. Curriculum Ready.

Blackbaud s Mailwise Service Analyse Records Updated by MailWise

Compilers Spring 2013 PRACTICE Midterm Exam

Typing with Weird Keyboards Notes

Unit 5 Vocabulary. A function is a special relationship where each input has a single output.

Very sad code. Abstraction, List, & Cons. CS61A Lecture 7. Happier Code. Goals. Constructors. Constructors 6/29/2011. Selectors.

SAPPER: Subgraph Indexing and Approximate Matching in Large Graphs

Midterm 2 Sample solution

Tries. Yufei Tao KAIST. April 9, Y. Tao, April 9, 2013 Tries

Transcription:

Slides for Dt Mining y I. H. Witten nd E. Frnk

Simplicity first Simple lgorithms often work very well! There re mny kinds of simple structure, eg: One ttriute does ll the work All ttriutes contriute eqully & independently A weighted liner comintion might do Instnce-sed: use few prototypes Use simple logicl rules Success of method depends on the domin 3

Inferring rudimentry rules 1R: lerns 1-level decision tree I.e., rules tht ll test one prticulr ttriute Bsic version One rnch for ech vlue Ech rnch ssigns most frequent clss Error rte: proportion of instnces tht don t elong to the mjority clss of their corresponding rnch Choose ttriute with lowest error rte (ssumes nominl ttriutes) 4

Pseudo-code for 1R For ech ttriute, For ech vlue of the ttriute, mke rule s follows: count how often ech clss ppers find the most frequent clss mke the rule ssign tht clss to this ttriute-vlue Clculte the error rte of the rules Choose the rules with the smllest error rte Note: missing is treted s seprte ttriute vlue 5

Evluting the wether ttriutes Outlook Temp Humidity Wind y Ply Attriute Rules Errors Totl errors Sunny Hot High Flse No Outlook Sunny No 2/5 4/14 Sunny Hot High True No Overcst 0/4 Overcst Hot High Flse Riny 2/5 Riny Mild High Flse Temp Hot No* 2/4 5/14 Riny Cool Flse Mild 2/6 Riny Cool True No Cool 1/4 Overcst Cool True Humidity High No 3/7 4/14 Sunny Mild High Flse No 1/7 Sunny Cool Flse Windy Flse 2/8 5/14 Riny Mild Flse True No* 3/6 Sunny Mild True Overcst Overcst Riny Mild Hot Mild High High True Flse True No * indictes tie 6

Deling with numeric ttriutes Discretize numeric ttriutes Divide ech ttriute s rnge into intervls Sort instnces ccording to ttriute s vlues Plce rekpoints where the clss chnges (the mjority clss) Outlook Temperture This minimizes the totl error Sunny Exmple: Sunny temperture 80 from 90 wether True dt Overcst Riny 85 83 75 Humidity 64 65 68 69 70 71 72 72 75 75 80 81 83 85 No No No No No 85 86 80 Windy Flse Flse Flse Ply No No 7

The prolem of overfitting This procedure is very sensitive to noise One instnce with n incorrect clss lel will proly produce seprte intervl Also: time stmp ttriute will hve zero errors Simple solution: enforce minimum numer of instnces in mjority clss per intervl Exmple (with min = 3): 64 65 68 69 70 71 72 72 75 75 80 81 83 85 No No No No No 64 65 68 69 70 71 72 72 75 75 80 81 83 85 No No No No No8

With overfitting voidnce Resulting rule set: Attriute Rules Errors Totl errors Outlook Sunny No 2/5 4/14 Overcst 0/4 Riny 2/5 Temperture 77.5 3/10 5/14 > 77.5 No* 2/4 Humidity 82.5 1/7 3/14 > 82.5 nd 95.5 No 2/6 > 95.5 0/1 Windy Flse 2/8 5/14 True No* 3/6 9

Discussion of 1R 1R ws descried in pper y Holte (1993) Contins n experimentl evlution on 16 dtsets (using cross-vlidtion so tht results were representtive of performnce on future dt) Minimum numer of instnces ws set to 6 fter some experimenttion 1R s simple rules performed not much worse thn much more complex decision trees Simplicity first pys off! Very Simple Clssifiction Rules Perform Well on Most Commonly Used Dtsets Roert C. Holte, Computer Science Deprtment, University of Ottw 10

Covering lgorithms Convert decision tree into rule set Strightforwrd, ut rule set overly complex More effective conversions re not trivil Insted, cn generte rule set directly for ech clss in turn find rule set tht covers ll instnces in it (excluding instnces not in the clss) Clled covering pproch: t ech stge rule is identified tht covers some of the instnces 44

Exmple: generting rule y x y 1 2 x y 2 6 1 2 x If true then clss = If x > 1.2 then clss = Possile rule set for clss : If x 1.2 then clss = If x > 1.2 nd y > 2.6 then clss = If x > 1.2 nd y 2.6 then clss = Could dd more rules, get perfect rule set 45

Rules vs. trees Corresponding decision tree: (produces exctly the sme predictions) But: rule sets cn e more perspicuous when decision trees suffer from replicted sutrees Also: in multiclss situtions, covering lgorithm concentrtes on one clss t time wheres decision tree lerner tkes ll clsses into ccount 46

Simple covering lgorithm Genertes rule y dding tests tht mximize rule s ccurcy Similr to sitution in decision trees: prolem of selecting n ttriute to split on But: decision tree inducer mximizes overll purity Ech new test reduces rule s coverge: spce of exmples rule so fr rule fter dding new term 47

Selecting test Gol: mximize ccurcy t totl numer of instnces covered y rule p positive exmples of the clss covered y rule t p numer of errors mde y rule Select test tht mximizes the rtio p/t We re finished when p/t = 1 or the set of instnces cn t e split ny further 48

Exmple: contct lens dt Rule we seek: Possile tests: If? then recommendtion = hrd Age = Young Age = Pre-presyopic Age = Presyopic Spectcle prescription = Myope Spectcle prescription = Hypermetrope Astigmtism = no Astigmtism = yes Ter production rte = Reduced Ter production rte = 2/8 1/8 1/8 3/12 1/12 0/12 4/12 0/12 4/12 49

Modified rule nd resulting dt Rule with est test dded: If stigmtism = yes then recommendtion = hrd Instnces covered y modified rule: Age Spectcle prescription Astigmtism Ter production rte Recommended lenses Young Myope Reduced None Young Myope Hrd Young Hypermetrope Reduced None Young Hypermetrope hrd Pre-presyopic Myope Reduced None Pre-presyopic Myope Hrd Pre-presyopic Hypermetrope Reduced None Pre-presyopic Hypermetrope None Presyopic Myope Reduced None Presyopic Myope Hrd Presyopic Hypermetrope Reduced None Presyopic Hypermetrope None 50

Further refinement Current stte: If stigmtism = yes nd? then recommendtion = hrd Possile tests: Age = Young Age = Pre-presyopic Age = Presyopic Spectcle prescription = Myope Spectcle prescription = Hypermetrope Ter production rte = Reduced Ter production rte = 2/4 1/4 1/4 3/6 1/6 0/6 4/6 51

Modified rule nd resulting dt Rule with est test dded: If stigmtism = yes nd ter production rte = norml then recommendtion = hrd Instnces covered y modified rule: Age Young Young Pre-presyopic Pre-presyopic Presyopic Presyopic Spectcle prescription Myope Hypermetrope Myope Hypermetrope Myope Hypermetrope Astigmtism Ter production rte Recommended lenses Hrd hrd Hrd None Hrd None 52

Further refinement Current stte: If stigmtism = yes nd ter production rte = norml nd? then recommendtion = hrd Possile tests: Age = Young Age = Pre-presyopic Age = Presyopic Spectcle prescription = Myope Spectcle prescription = Hypermetrope 2/2 1/2 1/2 3/3 1/3 Tie etween the first nd the fourth test We choose the one with greter coverge 53

The result Finl rule: If stigmtism = yes nd ter production rte = norml nd spectcle prescription = myope then recommendtion = hrd Second rule for recommending hrd lenses : (uilt from instnces not covered y first rule) If ge = young nd stigmtism = yes nd ter production rte = norml then recommendtion = hrd These two rules cover ll hrd lenses : Process is repeted with other two clsses 54

Pseudo-code for PRISM For ech clss C Initilize E to the instnce set While E contins instnces in clss C Crete rule R with n empty left-hnd side tht predicts clss C Until R is perfect (or there re no more ttriutes to use) do For ech ttriute A not mentioned in R, nd ech vlue v, Consider dding the condition A = v to the left-hnd side of R Select A nd v to mximize the ccurcy p/t (rek ties y choosing the condition with the lrgest p) Add A = v to R Remove the instnces covered y R from E 55

Rules vs. decision lists PRISM with outer loop removed genertes decision list for one clss Susequent rules re designed for rules tht re not covered y previous rules But: order doesn t mtter ecuse ll rules predict the sme clss Outer loop considers ll clsses seprtely No order dependence implied Prolems: overlpping rules, defult rule required 56

Seprte nd conquer Methods like PRISM (for deling with one clss) re seprte-nd-conquer lgorithms: First, identify useful rule Then, seprte out ll the instnces it covers Finlly, conquer the remining instnces Difference to divide-nd-conquer methods: Suset covered y rule doesn t need to e explored ny further 57