Decision Trees: Representa:on

Size: px

Start display at page:

Download "Decision Trees: Representa:on"

Charla Lewis
5 years ago
Views:

1 Decision Trees: Representa:on Machine Learning Fall 2017 Some slides from Tom Mitchell, Dan Roth and others 1

2 Key issues in machine learning Modeling How to formulate your problem as a machine learning problem? How to represent data? Which algorithms to use? What learning protocols? Representa:on Good hypothesis spaces and good features Algorithms What is a good learning algorithm? What is success? Generaliza:on vs overfimng The computa:onal ques:on: How long will learning take? 2

3 Coming up (the rest of the semester) Different hypothesis spaces and learning algorithms Decision trees and the ID3 algorithm Linear classifiers Perceptron Winnow SVM Logis:c regression Combining mul:ple classifiers Boos:ng, bagging Non- linear classifiers Nearest neighbors 3

4 Coming up (the rest of the semester) Different hypothesis spaces and learning algorithms Decision trees and the ID3 algorithm Linear classifiers Important issues to consider Perceptron Winnow SVM Logis:c regression 1. What do these hypotheses represent? 2. Implicit assump:ons and tradeoffs Combining mul:ple classifiers Boos:ng, 3. Generaliza:on? bagging Non- linear classifiers Nearest 4. neighbors How do we learn? 4

5 This lecture: Learning Decision Trees 1. Representa+on: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy heuris:c 3. Some extensions 5

6 This lecture: Learning Decision Trees 1. Representa+on: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy heuris:c 3. Some extensions 6

7 Represen:ng data Data can be represented as a big table, with columns deno:ng different a_ributes Name Label Claire Cardie - Peter Bartle_ + Eric Baum + Haym Hirsh + Shai Ben- David + Michael I. Jordan - 7

8 Represen:ng data Data can be represented as a big table, with columns deno:ng different a_ributes Name Special character in name? Second character of first name Length of first name>5? Gender Claire Cardie No l Yes Female - Peter Bartle_ No e No Male + Eric Baum No r No Male + Haym Hirsh No a No Male + Label Shai Ben- David Michael I. Jordan Yes h No Male + Yes i Yes Male - 8

Represen:ng data Data can be represented as a big table, with columns deno:ng different a_ributes Name Special character in name? Second character of first name Length of first name>5?

9 Represen:ng data Data can be represented as a big table, with columns deno:ng different a_ributes Name Special character in name? Second character of first name Length of first name>5? Gender Claire With Cardie these four No a_ributes, how l many unique rows Yes are possible? Female = 2704 Peter Bartle_ No e No Male + Eric Baum If there are 100 No a_ributes, all binary, r how many No unique rows Male are possible? Haym Hirsh No a No Male + Label Shai Ben- David Michael I. Jordan Yes h No Male - Yes i Yes Male + 9

10 Represen:ng data Data can be represented as a big table, with columns deno:ng different a_ributes Name Special character in name? Second character of first name Length of first name>5? Gender Claire With Cardie these four No a_ributes, how l many unique rows Yes are possible? Female = 208 Peter Bartle_ No e No Male + Eric Baum If there are 100 No a_ributes, all binary, r how many No unique rows Male are possible? Haym Hirsh No a No Male + Label Shai Ben- David Michael I. Jordan Yes h No Male - Yes i Yes Male + 10

11 Represen:ng data Data can be represented as a big table, with columns deno:ng different a_ributes Name Special character in name? Second character of first name Length of first name>5? Gender Claire With Cardie these four No a_ributes, how l many unique rows Yes are possible? Female = 208 Peter Bartle_ No e No Male + Eric Baum If there are 100 No a_ributes, all binary, r how many No unique rows Male are possible? Haym Hirsh No a No Male + Label Shai Ben- David Michael I. Jordan Yes h No Male - Yes i Yes Male + 11

12 Represen:ng data Data can be represented as a big table, with columns deno:ng different a_ributes Name Special character in name? Second character of first name Length of first name>5? Gender Claire With Cardie these four No a_ributes, how l many unique rows Yes are possible? Female = 208 Peter Bartle_ No e No Male + Eric Baum If there are 100 No a_ributes, all binary, r how many No unique rows Male are possible? Haym Hirsh No a No Male + Label Shai Ben- David Michael I. Jordan Yes h No Male - Yes i Yes Male + 12

13 Represen:ng data Data can be represented as a big table, with columns deno:ng different a_ributes Name Special character in name? Second character of first name Length of first name>5? Gender Claire With Cardie these four No a_ributes, how l many unique rows Yes are possible? Female = 208 Peter Bartle_ No e No Male + Eric Baum If there are 100 No a_ributes, all binary, r how many No unique rows Male are possible? - Haym Hirsh No a No Male + We need to figure out how to represent in a be_er, more efficient way Shai Ben- Yes h No Male - David Michael I. Jordan Label Yes i Yes Male + 13

14 What are decision trees? A hierarchical data structure that represents data using a divide- and- conquer strategy Can be used as hypothesis class for non- parametric classifica:on or regression General idea: Given a collec:on of examples, learn a decision tree that represents it 14

15 What are decision trees? Decision trees are a family of classifiers for instances that are represented by collec:ons of a_ributes (i.e. features) Nodes are tests for feature values There is one branch for every value that the feature can take Leaves of the tree specify the class labels 15

16 Let s build a decision tree for classifying shapes Label=C Label=B Label=A 16

17 Let s build a decision tree for classifying shapes Label=C Label=B Label=A Before building a decision tree: What is the label for a red triangle? And why? 17

18 Let s build a decision tree for classifying shapes Label=C Label=B Label=A What are some a_ributes of the examples? 18

19 Let s build a decision tree for classifying shapes Label=C Label=B Label=A What are some a_ributes of the examples? Color, Shape 19

20 Let s build a decision tree for classifying shapes Label=C Label=B Label=A What are some a_ributes of the examples? Color, Shape Color? 20

21 Let s build a decision tree for classifying shapes Label=C Label=B What are some a_ributes of the examples? Color, Shape Color? Blue Red Label=A Green 21

22 Let s build a decision tree for classifying shapes Label=C Label=B What are some a_ributes of the examples? Color, Shape Color? Blue Red B Label=A Green 22

23 Let s build a decision tree for classifying shapes Label=C What are some a_ributes of the examples? Color, Shape Color? B triangle square Blue Shape? A Label=B circle C Red B Label=A Green 23

24 Let s build a decision tree for classifying shapes Label=C What are some a_ributes of the examples? Color, Shape Color? B triangle Blue Shape? square A Label=B circle C Red B square B Label=A Green Shape? circle A 24

25 Let s build a decision tree for classifying shapes 1. How to use a decision tree for predic7on? What is the label for a red triangle? Just follow a path from the root to a leaf What about a green triangle? What are some a_ributes of the examples? Color, Shape Label=C 2. How do we learn Label=B a decision tree? Coming up soon triangle B Label=A Color? Blue Red Green Shape? B Shape? circle square square A C B circle A 25

26 Expressivity of Decision trees What Boolean func:ons can decision trees represent? Any Boolean func:on (Color=blue AND Shape=triangle ) Label=B) AND (Color=blue AND Shape=square ) Label=A) AND (Color=blue AND Shape=circle ) Label=C) AND. Every path from the tree to a root is a rule The full tree is equivalent to the conjunc:on of all the rules Any Boolean func7on can be represented as a decision tree. 26

27 Expressivity of Decision trees What Boolean func:ons can decision trees represent? Any Boolean func:on (Color=blue AND Shape=triangle ) Label=B) AND (Color=blue AND Shape=square ) Label=A) AND (Color=blue AND Shape=circle ) Label=C) AND. Every path from the tree to a root is a rule The full tree is equivalent to the conjunc:on of all the rules Any Boolean func7on can be represented as a decision tree. 27

28 Expressivity of Decision trees What Boolean func:ons can decision trees represent? Any Boolean func:on (Color=blue AND Shape=triangle ) Label=B) AND (Color=blue AND Shape=square ) Label=A) AND (Color=blue AND Shape=circle ) Label=C) AND. Every path from the tree to a root is a rule The full tree is equivalent to the conjunc:on of all the rules Any Boolean func7on can be represented as a decision tree. 28

29 Decision Trees Outputs are discrete categories But real valued outputs are also possible (regression trees) Methods for handling noisy data (noise in the label or in the features) and for handling missing a_ributes Pruning trees helps with noise More on this later 29

30 Numeric a_ributes and decision boundaries We have seen instances represented as a_ribute- value pairs (color=blue, second le_er=e, etc.) Values have been categorical How do we deal with numeric feature values? (eg length =?) Discre:ze them or use thresholds on the numeric values This example divides the feature space into axis parallel rectangles 30

31 Numeric a_ributes and decision boundaries We have seen instances represented as a_ribute- value pairs (color=blue, second le_er=e, etc.) Values have been categorical How do we deal with numeric feature values? (eg length =?) Discre:ze them or use thresholds on the numeric values This example divides the feature space into axis parallel rectangles 31

32 Numeric a_ributes and decision boundaries We have seen instances represented as a_ribute- value pairs (color=blue, second le_er=e, etc.) Values have been categorical How do we deal with numeric feature values? (eg length =?) Discre:ze them or use thresholds on the numeric values This example divides the feature space into axis parallel rectangles 32

33 Numeric a_ributes and decision boundaries We have seen instances represented as a_ribute- value pairs (color=blue, second le_er=e, etc.) Values have been categorical How do we deal with numeric feature values? (eg length =?) Discre:ze them or use thresholds on the numeric values This example divides the feature space into axis parallel rectangles Y X 33

34 Numeric a_ributes and decision boundaries We have seen instances represented as a_ribute- value pairs (color=blue, second le_er=e, etc.) Values have been categorical How do we deal with numeric feature values? (eg length =?) Discre:ze them or use thresholds on the numeric values This example divides the feature space into axis parallel rectangles Y X no Y>7 no yes X<3 no yes Y<5 no yes X < 1 yes

35 Numeric a_ributes and decision boundaries We have seen instances represented as a_ribute- value pairs (color=blue, second le_er=e, etc.) Values have been categorical How do we deal with numeric feature values? (eg length =?) Discre:ze them or use thresholds on the numeric values This example divides the feature space into axis parallel rectangles Decision Y boundaries can be non- linear X no Y>7 no yes X<3 no yes Y<5 no yes X < 1 yes

36 Summary: Decision trees Decision trees can represent any Boolean func:on A way to represent lot of data A natural representa:on (think 20 ques:ons) Predic:ng with a decision tree is easy Clearly, given a dataset, there are many decision trees that can represent it. Why? Learning a good representa:on from data is the next ques:on 36

37 Summary: Decision trees Decision trees can represent any Boolean func:on A way to represent lot of data A natural representa:on (think 20 ques:ons) Predic:ng with a decision tree is easy Clearly, given a dataset, there are many decision trees that can represent it. Why? Learning a good representa:on from data is the next ques:on 37

38 Exercise Write down the decision tree for the shapes data if the root node was Shape instead of Color Will the two trees make the same predic:ons for unseen shapes/color combina:ons? Label=C Label=B Label=A 38

Decision Trees: Representation

Decision Trees: Representation Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1 Key issues in machine learning Modeling How to formulate your problem as a machine learning