Decision Trees: Representation

Similar documents
Decision Trees: Representa:on

Supervised Learning: The Setup. Spring 2018

Machine Learning. Decision Trees. Manfred Huber

Decision Trees: Discussion

Ensemble Methods, Decision Trees

Classification Algorithms in Data Mining

Decision Trees Oct

Decision Tree CE-717 : Machine Learning Sharif University of Technology

PROBLEM 4

Nearest Neighbor Classification. Machine Learning Fall 2017

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

Lecture 7: Decision Trees

Bias-Variance Analysis of Ensemble Learning

CS 229 Midterm Review

Business Club. Decision Trees

Classification with Decision Tree Induction

Random Forest A. Fornaser

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU

Decision Tree Learning

Evaluating Classifiers

A Systematic Overview of Data Mining Algorithms

Data Preprocessing. Supervised Learning

Supervised vs unsupervised clustering

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Symbolic AI. Andre Freitas. Photo by Vasilyev Alexandr

Machine Learning Techniques

Network Traffic Measurements and Analysis

Supervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...

Text Categorization. Foundations of Statistic Natural Language Processing The MIT Press1999

CS Machine Learning

Tree-based methods for classification and regression

Decision Trees. Petr Pošík. Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics

Supervised Learning (contd) Linear Separation. Mausam (based on slides by UW-AI faculty)

Evaluating Classifiers

Machine Learning. Chao Lan

Large Scale Data Analysis Using Deep Learning

Machine Learning Techniques for Data Mining

Classification: Feature Vectors

Slides for Data Mining by I. H. Witten and E. Frank

CSC411/2515 Tutorial: K-NN and Decision Tree

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Applying Supervised Learning

Nearest Neighbor Predictors

Decision Tree Learning

Data Mining Practical Machine Learning Tools and Techniques

All lecture slides will be available at CSC2515_Winter15.html

Contents. Preface to the Second Edition

Lecture 25: Review I

Supervised Learning: K-Nearest Neighbors and Decision Trees

CS273 Midterm Exam Introduction to Machine Learning: Winter 2015 Tuesday February 10th, 2014

Announcements. CS 188: Artificial Intelligence Spring Classification: Feature Vectors. Classification: Weights. Learning: Binary Perceptron

Lecture 5: Decision Trees (Part II)

15-780: Graduate Artificial Intelligence. Decision trees

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form)

Nearest neighbor classification DSE 220

The Curse of Dimensionality

A Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York

Classification and Regression Trees

Uninformed Search Methods. Informed Search Methods. Midterm Exam 3/13/18. Thursday, March 15, 7:30 9:30 p.m. room 125 Ag Hall

Topics in Machine Learning

Machine Learning: Algorithms and Applications Mockup Examination

CSE 573: Artificial Intelligence Autumn 2010

Data Mining. Lecture 03: Nearest Neighbor Learning

Task Description: Finding Similar Documents. Document Retrieval. Case Study 2: Document Retrieval

Ensemble Learning. Another approach is to leverage the algorithms we have via ensemble methods

SOCIAL MEDIA MINING. Data Mining Essentials

Practice EXAM: SPRING 2012 CS 6375 INSTRUCTOR: VIBHAV GOGATE

7. Nearest neighbors. Learning objectives. Foundations of Machine Learning École Centrale Paris Fall 2015

6.034 Quiz 2, Spring 2005

Fall 09, Homework 5

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Fall 2018 CSE 482 Big Data Analysis: Exam 1 Total: 36 (+3 bonus points)

Learning Rules. CS 391L: Machine Learning: Rule Learning. Raymond J. Mooney. Rule Learning Approaches. Decision-Trees to Rules. Sequential Covering

Machine Learning. Topic 5: Linear Discriminants. Bryan Pardo, EECS 349 Machine Learning, 2013

Feature Extractors. CS 188: Artificial Intelligence Fall Some (Vague) Biology. The Binary Perceptron. Binary Decision Rule.

ADVANCED CLASSIFICATION TECHNIQUES

Model Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018

CSE 158. Web Mining and Recommender Systems. Midterm recap

Text classification II CE-324: Modern Information Retrieval Sharif University of Technology

k-nearest Neighbors + Model Selection

Machine Learning - Lecture 2: Nearest-neighbour methods

Data Mining Lecture 8: Decision Trees

INTRODUCTION TO MACHINE LEARNING. Measuring model performance or error

Classifiers and Detection. D.A. Forsyth

K Nearest Neighbor Wrap Up K- Means Clustering. Slides adapted from Prof. Carpuat

Midterm Examination CS540-2: Introduction to Artificial Intelligence

Geometric data structures:

Supervised Learning. CS 586 Machine Learning. Prepared by Jugal Kalita. With help from Alpaydin s Introduction to Machine Learning, Chapter 2.

5 Learning hypothesis classes (16 points)

Decision Tree (Continued) and K-Nearest Neighbour. Dr. Xiaowei Huang

06: Logistic Regression

COMP 465: Data Mining Classification Basics

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

7. Nearest neighbors. Learning objectives. Centre for Computational Biology, Mines ParisTech

Lecture 2 :: Decision Trees Learning

Feature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.

COSC160: Detection and Classification. Jeremy Bolton, PhD Assistant Teaching Professor

CS229 Lecture notes. Raphael John Lamarre Townshend

Transcription:

Decision Trees: Representation Machine Learning Fall 2018 Some slides from Tom Mitchell, Dan Roth and others 1

Key issues in machine learning Modeling How to formulate your problem as a machine learning problem? How to represent data? Which algorithms to use? What learning protocols? Representation Good hypothesis spaces and good features Algorithms What is a good learning algorithm? What is success? Generalization vs overfitting The computational question: How long will learning take? 2

Coming up (the rest of the semester) Different hypothesis spaces and learning algorithms Decision trees and the ID3 algorithm Linear classifiers Perceptron SVM Logistic regression Combining multiple classifiers Boosting, bagging Non-linear classifiers Nearest neighbors 3

Coming up (the rest of the semester) Different hypothesis spaces and learning algorithms Decision trees and the ID3 algorithm Linear classifiers Important issues to consider Perceptron SVM 1. What do these hypotheses represent? Logistic regression 2. Implicit assumptions and tradeoffs Combining multiple classifiers Boosting, bagging 3. Generalization? Non-linear classifiers Nearest 4. neighbors How do we learn? 4

This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy heuristic 3. Some extensions 5

This lecture: Learning Decision Trees 1. Representation: What are decision trees? 2. Algorithm: Learning decision trees The ID3 algorithm: A greedy heuristic 3. Some extensions 6

Representing data Data can be represented as a big table, with columns denoting different attributes Name Label Claire Cardie - Peter Bartlett + Eric Baum + Haym Hirsh + Shai Ben-David + Michael I. Jordan - 7

Representing data Data can be represented as a big table, with columns denoting different attributes Name Name has punctuation? Second character of first name Length of first name>5? Male/ Female? Claire Cardie No l Yes Female - Peter Bartlett No e No Male + Eric Baum No r No Male + Haym Hirsh No a No Male + Label Shai Ben- David Michael I. Jordan Yes h No Male + Yes i Yes Male - 8

Representing data Data can be represented as a big table, with columns denoting different attributes Name Name has punctuation? Second character of first name Length of first name>5? Male/ Female? Claire With Cardiethese four No attributes, how lmany unique rows Yes are possible? Female - 2 26 26 2 = 2704 Peter Bartlett No e No Male + Eric Baum If there are 100 No attributes, all binary, r how many Nounique rows Male are possible? - 2 100 Haym Hirsh No a No Male + Label Shai Ben- David Michael I. Jordan Yes h No Male - Yes i Yes Male + 9

Representing data Data can be represented as a big table, with columns denoting different attributes Name Name has punctuation? Second character of first name Length of first name>5? Male/ Female? Claire With Cardiethese four No attributes, how lmany unique rows Yes are possible? Female - 2 26 2 2 = 208 Peter Bartlett No e No Male + Eric Baum If there are 100 No attributes, all binary, r how many Nounique rows Male are possible? - 2 100 Haym Hirsh No a No Male + Label Shai Ben- David Michael I. Jordan Yes h No Male - Yes i Yes Male + 10

Representing data Data can be represented as a big table, with columns denoting different attributes Name Name has punctuation? Second character of first name Length of first name>5? Male/ Female? Claire With Cardiethese four No attributes, how lmany unique rows Yes are possible? Female - 2 26 2 2 = 208 Peter Bartlett No e No Male + Eric Baum If there are 100 No attributes, all binary, r how many Nounique rows Male are possible? - 2 100 Haym Hirsh No a No Male + Label Shai Ben- David Michael I. Jordan Yes h No Male - Yes i Yes Male + 11

Representing data Data can be represented as a big table, with columns denoting different attributes Name Name has punctuation? Second character of first name Length of first name>5? Male/ Female? Claire With Cardiethese four No attributes, how lmany unique rows Yes are possible? Female - 2 26 2 2 = 208 Peter Bartlett No e No Male + Eric Baum If there are 100 No attributes, all binary, r how many Nounique rows Male are possible? - (100 times) 2 2 2 2 = 2 )** Haym Hirsh No a No Male + Label Shai Ben- David Michael I. Jordan Yes h No Male - Yes i Yes Male + 12

Representing data Data can be represented as a big table, with columns denoting different attributes Name Name has punctuation? Second character of first name Length of first name>5? Male/ Female? Label Claire With Cardiethese four No attributes, how lmany unique rows Yes are possible? Female - 2 26 2 2 = 208 Peter Bartlett No e No Male + Eric Baum If there are 100 No attributes, all binary, r how many Nounique rows Male are possible? - (100 times) 2 2 2 2 = 2 )** Haym Hirsh No a No Male + Shai Ben- If we wanted Yes to store all possible h rows, this No number is Male too large. - David We need to figure out how to represent data in a better, more efficient way Michael I. Yes i Yes Male + Jordan 13

What are decision trees? A hierarchical data structure that represents data using a divide-and-conquer strategy Can be used as hypothesis class for non-parametric classification or regression General idea: Given a collection of examples, learn a decision tree that represents it 14

What are decision trees? Decision trees are a family of classifiers for instances that are represented by collections of attributes (i.e. features) Nodes are tests for feature values There is one branch for every value that the feature can take Leaves of the tree specify the class labels 15

Let s build a decision tree for classifying shapes Label=C Label=B Label=A 16

Let s build a decision tree for classifying shapes Label=C Label=B Label=A Before building a decision tree: What is the label for a red triangle? And why? 17

Let s build a decision tree for classifying shapes Label=C Label=B Label=A What are some attributes of the examples? 18

Let s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Label=C Label=B Label=A 19

Let s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Label=C Label=B Label=A Color? 20

Let s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Label=C Label=B Label=A Color? Blue Red Green 21

Let s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Label=C Label=B Label=A Color? Blue Red Green B 22

Let s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Label=C Label=B Label=A Blue Shape? triangle square B A Color? Green Red B circle C 23

Let s build a decision tree for classifying shapes What are some attributes of the examples? Color, Shape Label=C Label=B Label=A triangle B Color? Blue Red Green Shape? B Shape? circle square square A C B circle A 24

Let s build a decision tree for classifying shapes 1. How do we learn a decision tree? Coming up soon 2. How to use a decision tree for prediction? What is the label for a red triangle? Label=C Just follow a path Label=B from the root to a leaf Label=A What are some attributes of the examples? What about a green triangle? Color, Shape B triangle Blue Shape? square A circle Color? C Red B square B Green Shape? circle A 25

Let s build a decision tree for classifying shapes 1. How do we learn a decision tree? Coming up soon 2. How to use a decision tree for prediction? What is the label for a red triangle? Label=C Just follow a path Label=B from the root to a leaf Label=A What are some attributes of the examples? What about a green triangle? Color, Shape B triangle Blue Shape? square A circle Color? C Red B square B Green Shape? circle A 26

Expressivity of Decision trees What Boolean functions can decision trees represent? Any Boolean function Every path from the tree to a root is a rule The full tree is equivalent to the conjunction of all the rules (Color=blue AND Shape=triangle ) Label=B) AND (Color=blue AND Shape=square ) Label=A) AND (Color=blue AND Shape=circle ) Label=C) AND. Any Boolean function can be represented as a decision tree. 27

Expressivity of Decision trees What Boolean functions can decision trees represent? Any Boolean function Every path from the tree to a root is a rule The full tree is equivalent to the conjunction of all the rules (Color=blue AND Shape=triangle ) Label=B) AND (Color=blue AND Shape=square ) Label=A) AND (Color=blue AND Shape=circle ) Label=C) AND. Any Boolean function can be represented as a decision tree. 28

Expressivity of Decision trees What Boolean functions can decision trees represent? Any Boolean function Every path from the tree to a root is a rule The full tree is equivalent to the conjunction of all the rules (Color=blue AND Shape=triangle ) Label=B) AND (Color=blue AND Shape=square ) Label=A) AND (Color=blue AND Shape=circle ) Label=C) AND. Any Boolean function can be represented as a decision tree. 29

Decision Trees Outputs are discrete categories But real valued outputs are also possible (regression trees) Well studied methods for handling noisy data (noise in the label or in the features) and for handling missing attributes Pruning trees helps with noise More on this later 30

Numeric attributes and decision boundaries We have seen instances represented as attribute-value pairs (color=blue, second letter=e, etc.) Values have been categorical How do we deal with numeric feature values? (eg length =?) Discretize them or use thresholds on the numeric values This example divides the feature space into axis parallel rectangles 31

Numeric attributes and decision boundaries We have seen instances represented as attribute-value pairs (color=blue, second letter=e, etc.) Values have been categorical How do we deal with numeric feature values? (eg length =?) Discretize them or use thresholds on the numeric values This example divides the feature space into axis parallel rectangles 32

Numeric attributes and decision boundaries We have seen instances represented as attribute-value pairs (color=blue, second letter=e, etc.) Values have been categorical How do we deal with numeric feature values? (eg length =?) Discretize them or use thresholds on the numeric values This example divides the feature space into axis parallel rectangles Y 7 5 + + + + + - - + - 1 3 X 33

Numeric attributes and decision boundaries We have seen instances represented as attribute-value pairs (color=blue, second letter=e, etc.) Values have been categorical How do we deal with numeric feature values? (eg length =?) Discretize them or use thresholds on the numeric values This example divides the feature space into axis parallel rectangles Y 7 5 + + + - + + + - - 1 3 X no Y>7 no yes X<3 no - + + yes Y<5 no yes X < 1 yes + - 34

Numeric attributes and decision boundaries We have seen instances represented as attribute-value pairs (color=blue, second letter=e, etc.) Values have been categorical How do we deal with numeric feature values? (eg length =?) Discretize them or use thresholds on the numeric values This example divides the feature space into axis parallel rectangles Decision Y boundaries can be non-linear 7 5 + + + - + + + - - 1 3 X no Y>7 no yes X<3 no - + + yes Y<5 no yes X < 1 yes + - 35

Summary: Decision trees Decision trees can represent any Boolean function A way to represent lot of data A natural representation (think 20 questions) Predicting with a decision tree is easy Clearly, given a dataset, there are many decision trees that can represent it. [Exercise: Why?] Learning a good representation from data is the next question 36

Summary: Decision trees Decision trees can represent any Boolean function A way to represent lot of data A natural representation (think 20 questions) Predicting with a decision tree is easy Clearly, given a dataset, there are many decision trees that can represent it. [Exercise: Why?] Learning a good representation from data is the next question 37

Exercises 1. Write down the decision tree for the shapes data if the root node was Shape instead of Color. 2. Will the two trees make the same predictions for unseen shapes/color combinations? Label=C Label=B Label=A 3. Show that multiple structurally different decision trees can represent the same Boolean function of two or more variables. 38