ARTIFICIAL INTELLIGENCE (CS 370D)

Similar documents
Data Mining Concepts & Techniques

Classification and Regression

CSE4334/5334 DATA MINING

Classification: Basic Concepts, Decision Trees, and Model Evaluation

CS Machine Learning

Classification. Instructor: Wei Ding

Data Mining: Data. What is Data? Lecture Notes for Chapter 2. Introduction to Data Mining. Properties of Attribute Values. Types of Attributes

Data Mining: Data. Lecture Notes for Chapter 2. Introduction to Data Mining

CSE4334/5334 Data Mining 4 Data and Data Preprocessing. Chengkai Li University of Texas at Arlington Fall 2017

Part I. Instructor: Wei Ding

Extra readings beyond the lecture slides are important:

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Example of DT Apply Model Example Learn Model Hunt s Alg. Measures of Node Impurity DT Examples and Characteristics. Classification.

Data Mining Classification - Part 1 -

Data Preprocessing UE 141 Spring 2013

Analytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

CS513-Data Mining. Lecture 2: Understanding the Data. Waheed Noor

Machine Learning in Real World: C4.5

Classification with Decision Tree Induction

Data Mining Concepts & Techniques

COMP 465: Data Mining Classification Basics

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

CS570 Introduction to Data Mining

CMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)

ECLT 5810 Data Preprocessing. Prof. Wai Lam

Decision Tree CE-717 : Machine Learning Sharif University of Technology

Lecture 7: Decision Trees

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

ISSUES IN DECISION TREE LEARNING

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Nearest neighbor classification DSE 220

What Is Data Mining? CMPT 354: Database I -- Data Mining 2

Data Mining. 3.3 Rule-Based Classification. Fall Instructor: Dr. Masoud Yaghini. Rule-Based Classification

DATA MINING LECTURE 11. Classification Basic Concepts Decision Trees Evaluation Nearest-Neighbor Classifier

Data Mining Course Overview

Data Mining Part 5. Prediction

Midterm Examination CS540-2: Introduction to Artificial Intelligence

ECLT 5810 Clustering

CS 237: Probability in Computing

Decision Trees. Petr Pošík. Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics

DATA MINING LECTURE 9. Classification Basic Concepts Decision Trees Evaluation

DATA ANALYSIS I. Types of Attributes Sparse, Incomplete, Inaccurate Data

DATA MINING LECTURE 9. Classification Decision Trees Evaluation

Chapter 1 Introduction to Statistics

Decision Tree Learning

Chapter Two: Descriptive Methods 1/50

Decision Making Procedure: Applications of IBM SPSS Cluster Analysis and Decision Tree

List of Exercises: Data Mining 1 December 12th, 2015

Data Analysis using SPSS

ECLT 5810 Clustering

Nominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975.

Machine Learning. bad news. Big Picture: Supervised Learning. Supervised (Function) Learning. Learning From Data with Decision Trees.

Introduction to Machine Learning

MAT 155. Chapter 1 Introduction to Statistics. sample. population. parameter. statistic

Lecture outline. Decision-tree classification

Data Mining. Part 1. Introduction. 1.4 Input. Spring Instructor: Dr. Masoud Yaghini. Input

Jarek Szlichta

CS378 Introduction to Data Mining. Data Exploration and Data Preprocessing. Li Xiong

Midterm Examination CS 540-2: Introduction to Artificial Intelligence

Lecture Notes for Chapter 4

CAP-359 PRINCIPLES AND APPLICATIONS OF DATA MINING. Rafael Santos

Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3

Nominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML

Introducing Categorical Data/Variables (pp )

LSP 121. LSP 121 Math and Tech Literacy II. Topics. Quartiles. Intro to Statistics. More Descriptive Statistics

DATA MINING INTRO LECTURE. Introduction

Chapter 3: Data Mining:

Classification Salvatore Orlando

Data Mining and Analytics

NOTES TO CONSIDER BEFORE ATTEMPTING EX 1A TYPES OF DATA

Data Mining Algorithms: Basic Methods

Thanks to the advances of data processing technologies, a lot of data can be collected and stored in databases efficiently New challenges: with a

MIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018

Credit card Fraud Detection using Predictive Modeling: a Review

Data Warehousing & Data Mining

What is Learning? CS 343: Artificial Intelligence Machine Learning. Raymond J. Mooney. Problem Solving / Planning / Control.

Machine Learning Chapter 2. Input

Decision tree learning

Data Mining. 1.3 Input. Fall Instructor: Dr. Masoud Yaghini. Chapter 3: Input

Data Mining Clustering

Nesnelerin İnternetinde Veri Analizi

Test Bank for Privitera, Statistics for the Behavioral Sciences

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester

Data Mining. Part 1. Introduction. 1.3 Input. Fall Instructor: Dr. Masoud Yaghini. Input

BITS F464: MACHINE LEARNING

CIS 4930/6930 Spring 2014 Introduction to Data Science /Data Intensive Computing. University of Florida, CISE Department Prof.

Homework # 4. Example: Age in years. Answer: Discrete, quantitative, ratio. a) Year that an event happened, e.g., 1917, 1950, 2000.

Given a collection of records (training set )

DATA MINING LECTURE 10B. Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines

Representing structural patterns: Reading Material: Chapter 3 of the textbook by Witten

Medoid Partitioning. Chapter 447. Introduction. Dissimilarities. Types of Cluster Variables. Interval Variables. Ordinal Variables.

CS 584 Data Mining. Classification 1

Data Mining D E C I S I O N T R E E. Matteo Golfarelli

Chapter 4 Data Mining A Short Introduction

Data Mining. 3.5 Lazy Learners (Instance-Based Learners) Fall Instructor: Dr. Masoud Yaghini. Lazy Learners

19 Machine Learning in Lisp

Clustering Large Credit Client Data Sets for Classification with SVM

Transcription:

Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D)

(CHAPTER-18) LEARNING FROM EXAMPLES DECISION TREES

Outline 1- Introduction 2- know your data 3- Classification tree construction schema 3 3

Know your data? 4

Types of Data Sets Record Relational records Data matrix, e.g., numerical matrix, crosstabs Document data: text documents: term-frequency vector Transaction data Graph and network Ordered World Wide Web Social or information networks Molecular Structures Video data: sequence of images Temporal data: time-series Sequential Data: transaction sequences Genetic sequence data Spatial, image and multimedia: Spatial data: maps Image data: Video data: 5 Document 1 Document 2 Document 3 team coach pla y ball score game wi n lost timeout 3 0 5 0 2 6 0 2 0 2 0 0 TID 7 0 2 1 0 0 3 0 0 1 0 0 1 2 2 0 3 0 Items 1 Bread, Coke, Milk 2 Beer, Bread 3 Beer, Coke, Diaper, Milk 4 Beer, Bread, Diaper, Milk 5 Coke, Diaper, Milk season

Types of Variables (data that are counted) (data that are measured)

Types of Variables named, i.e. classified into one or more qualitative categories or description (data that are counted) In medicine, nominal variables are often used to describe the patient. Examples of nominal variables might include: Gender (male, female) Eye color (blue, brown, green, hazel) (data that are measured) Surgical outcome (dead, alive) Blood type (A, B, AB, O)

Types of Variables (data that are counted) Values have a meaningful order (ranking) In medicine, ordinal variables often describe the patient s characteristics, attitude, behavior, or status. Examples of ordinal variables might include: Stage of cancer (data (stage that I, II, are III, measured) IV) Education level (elementary, secondary, college) Pain level (mild, moderate, severe) Satisfaction level (very dissatisfied, dissatisfied, neutral, satisfied, very satisfied) Agreement level (strongly disagree, disagree, neutral, agree, strongly agree)

Types of Variables (data that are counted) assume values corresponding to isolated points along a line interval. That is, there is a gap between any two values. Discrete Variables that have constant, equal distances between values, but the zero point is arbitrary. (data that are measured) Examples of interval variables: Intelligence (IQ test score of 100, 110, 120, etc.) Pain level (1-10 scale) Body length in infant

Types of Variables (data that are counted) uncountable number of values. can assume any value along a line interval, including every possible value between any two values. (data that are measured) Practically, real values can only be measured and represented using a finite number of digits Continuous attributes are typically represented as floating-point variables

Classification 11

Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class. Find a model for class attribute as a function of the values of other attributes. Goal: previously unseen records should be assigned a class as accurately as possible. 12 A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it. 12

10 10 Illustrating Classification Task 13 Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes Training Set Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K? 12 Yes Medium 80K? 13 Yes Large 110K? 14 No Small 95K? 15 No Large 67K? 13 Test Set Induction Deduction Learning algorithm Learn Model Apply Model Model

Decision Tree Learning 14

Definition Decision tree is a classifier in the form of a tree structure Decision node: specifies a test on a single attribute Leaf node: indicates the value of the target attribute Arc/edge: split of one attribute Path: a disjunction of test to make the final decision Decision trees classify instances or examples by starting at the root of the tree and moving through it until a leaf node. 15

key requirements Attribute-value description: object or case must be expressible in terms of a fixed collection of properties or attributes (e.g., hot, mild, cold). Predefined classes (target values): the target function has discrete output values (single or multiclass) Sufficient data: enough training cases should be provided to learn the model. 16

Training Examples 17

Shall we play tennis today? 18

Decision Tree Construction Top-down tree construction schema: Examine training database and find best splitting predicate for the root node Partition training database Recourse on each child node 19 19

Advantages of using DT Fast to implement Simple to implement because it perform classification without much computation Can convert result to a set of easily interpretable rules that can be used in knowledge system such as database, where rules are built from the label of the nodes and the labels of the arcs. Can handle continuous and categorical variables Can handle noisy data provide a clear indication of which fields are most important for prediction or classification 20

Disadvantages of using DT "Univariate" splits/partitioning using only one attribute at a time so limits types of possible trees large decision trees may be hard to understand Perform poorly with many class and small data. 21

Decision Tree Construction (cont..) Two important algorithmic components: 1. Splitting (ID3, CART, C4.5, QUEST, CHAID, ) 2. Missing Values 22 22

1-Splitting 23

2.1- Splitting Depends on Data base Attributes types Statistical calculation technique or method (ID3, CART, C4.5, QUEST, CHAID, ) 24 24

Attribute s types 2.1- Split selection Selection of an attribute to test at each node - choosing the most useful attribute for classifying examples. attributes or features Class label or decision 25 How to specify the test condition at each node?? 25

Attribute s types How to specify the test condition at each node?? Some possibilities are: Random: Select any attribute at random Least-Values: Choose the attribute with the smallest number of possible values (fewer branches) Most-Values: Choose the attribute with the largest number of possible values (smaller subsets) Max-Gain: Choose the attribute that has the largest expected information gain, i.e. select attribute that will result in the smallest expected size of the subtrees rooted at its children. 26

Attribute s types How to specify the test condition at each node?? Depends on attribute types 2.1.1 Nominal 2.1.2 Continuous 27

Attribute s types 2.1.1 Nominal or Categorical (Discrete) Nominal or categorical (Discrete): Domain is a finite set without any natural ordering (e.g., occupation, marital status (single, married, divorced )) Each non-leaf node is a test, its edge partitioning the attribute into subsets (easy for discrete attribute). hot temperature mild cool 28 28

Attribute s types 2.1.2. Continuous or Numerical Continuous or Numerical: Domain is ordered and can be represented on the real line (e.g., age, income, temperatures degree ) Convert to Binary Create a new boolean attribute A c, looking for a threshold c, A c true if Ac c false otherwise Discretization to form an ordinal categorical attribute where ranges can be found by equal intervals 29 How to find best threshold?? 29

2.1- Splitting Depends on Data base Attributes types Statistical calculation technique or method (ID3, CART, C4.5, QUEST, CHAID, ) 30 30

Statistical calculation technique Entropy Information gain Gain ratio 31 31

Entropy Entropy at a given node t: Entropy ( t) p( j t)log p( j t) j (NOTE: p( j t) is the relative frequency of class j at node t). Measures homogeneity of a node. Maximum (log n c ) when records are equally distributed among all classes implying least information 32 Minimum (0.0) when all records belong to one class, implying most information 32

33 Example 33 No Risk Credit history Debt Collateral Income 1 high bad high none 0-15 $ 2 high unknown high none 15-35$ 3 moderate unknown low none 15-35$ 4 high unknown low none 0-15 $ 5 low unknown low none Over 35$ 6 low unknown low adequate Over 35$ 7 high bad low none 0-15 $ 8 moderate bad low adequate Over 35$ 9 low good low none Over 35$ 10 low good high adequate Over 35$ 11 high good high none 0-15 $ 12 moderate good high none 15-35$ 13 low good high none Over 35$ 14 High bad high none 15-35$

Example starting with the population of loans suppose we first select the income property this separates the examples into three partitions 34 all examples in leftmost partition have same conclusion HIGH RISK other partitions can be further subdivided by selecting another property 34

ID3 & information theory the selection of which property to split on next is based on information theory the information content of a tree is defined by I[tree] = -prob(classification i ) * log 2 ( prob(classification i ) ) e.g., In credit risk data, there are 14 samples prob(high risk) = 6/14 prob(moderate risk) = 3/14 prob(low risk) = 5/14 the information content of a tree that correctly classifies these examples: I[tree] = -6/14 * log 2 (6/14) + -3/14 * log 2 (3/14) + -5/14 * log 2 (5/14) = -6/14 * -1.222 + -3/14 * -2.222 + -5/14 * -1.485 = 1.531 35

ID3 & more information theory- example after splitting on a property, consider the expected (or remaining) content of the subtrees E[property] = (# in subtree i / # of samples) * I [subtree i ] H H H H H M M H L L M L L L E[income] = 4/14 * I[subtree 1 ] + 4/14 * I[subtree 2 ] + 6/14 * I[subtree 3 ] = 4/14 * (-4/4 log 2 (4/4) + -0/4 log 2 (0/4) + -0/4 log 2 (0/4)) + 4/14 * (-2/4 log 2 (2/4) + -2/4 log 2 (2/4) + -0/4 log 2 (0/4)) + 6/14 * (-0/6 log 2 (0/6) + -1/6 log 2 (1/6) + -5/6 log 2 (5/6)) = 4/14 * (0.0+0.0+0.0) + 4/14 * (0.5+0.5+0.0) + 6/14 * (0.0+0.43+0.22) 36 = 0.0 + 0.29 + 0.28 = 0.57

Credit risk example (cont.) what about the other property options? E[debt]? E[history]? E[collateral]? after further analysis E[income] = 0.57 E[debt] = 1.47 E[history] = 1.26 E[collateral] = 1.33 the ID3 selection rules splits on the property that produces the minimal E[property] in this example, income will be the first property split then repeat the process on each subtree 37

2.3- Missing Values What is the problem? During computation of the splitting predicate, we can selectively ignore records with missing values (note that this has some problems) But if a record r misses the value of the variable in the splitting attribute, r can not participate further in tree construction Algorithms for missing values address this problem. Simplest algorithm to solve this problem : -If X is numerical (categorical), impute the overall mean - if X is discrete attribute set the most common value 38 38

39 Thank you End of Chapter 18-part 2