Induction of Decision Trees. An Example Data Set and Decision Tree
|
|
- Pamela Harper
- 6 years ago
- Views:
Transcription
1 Induction of Decision Trees Blaž Zupan, Ivan Bratko magixfriuni-ljsi/predavanja/uisp An Example Data Set and Decision Tree # Attribute Class Outlook Company Sailboat Sail? 1 sunny big small 2 sunny med small 3 sunny med big 4 sunny small sunny outlook rainy 5 sunny big big 6 rainy small company 7 rainy med small 8 rainy big big 9 rainy big med big 10 rainy med big sailboat small big 1
2 Classification # Attribute Class Outlook Company Sailboat Sail? 1 sunny big? 2 rainy big small? sunny outlook rainy company med big sailboat small big Induction of Decision Trees Data Set (Learning Set) Each example = Attributes + Class Induced description = Decision tree TDIDT Top Down Induction of Decision Trees Recursive artitioning 2
3 Some TDIDT Systems ID3 (Quinlan 79) CART (Brieman et al 84) Assistant (Cestnik et al 87) C45 (Quinlan 93) See5 (Quinlan 97) Orange (Demšar, Zupan 98-03) Analysis of Severe Trauma atients Data H_ICU The worst ph value at ICU < >733 Death 00 (0/15) AT_WORST Well 088 (14/16) Well 082 (9/11) <787 >=787 Death 00 (0/7) The worst active partial thromboplastin time H_ICU and AT_WORST are exactly the two factors (theoretically) advocated to be the most important ones in the study by Rotondo et al,
4 Breast Cancer Recurrence Degree of Malig < 3 >= 3 Tumor Size Involved Nodes < 15 >= 15 < 3 >= 3 Age rec 125 recurr 39 rec 30 recurr 18 recurr 27 _rec 10 rec 4 recurr 1 rec 32 recurr 0 Tree induced by Assistant rofessional Interesting: Accuracy of this tree compared to medical specialists rostate cancer recurrence Secondary Gleason Grade 1, No SA Level Stage Yes 149 >149 T1c,T2a, T2b,T2c T1ab,T3 No rimary Gleason Grade No Yes 2,3 4 No Yes 4
5 TDIDT Algorithm Also kwn as ID3 (Quinlan) To construct decision tree T from learning set S: If all examples in S belong to some class C Then make leaf labeled C Otherwise select the most informative attribute A partition S according to A s values recursively construct subtrees T1, T2,, for the subsets of S TDIDT Algorithm Resulting tree T is: A Attribute A v1 v2 vn A s values T1 T2 Tn Subtrees 5
6 Ather Example # Attribute Class Outlook Temperature Humidity Windy lay 1 sunny hot high N 2 sunny hot high N 3 overcast hot high 4 rainy moderate high 5 rainy cold rmal 6 rainy cold rmal N 7 overcast cold rmal 8 sunny moderate high N 9 sunny cold rmal 10 rainy moderate rmal 11 sunny moderate rmal 12 overcast moderate high 13 overcast hot rmal 14 rainy moderate high N Simple Tree Outlook sunny overcast rainy Humidity Windy high rmal N N 6
7 Complicated Tree Temperature cold moderate hot Outlook Outlook Windy sunny overcast rainy sunny overcast rainy Windy Windy Humidity N Humidity high rmal high rmal N N Windy Outlook sunny overcast rainy N N null Attribute Selection Criteria Main principle Select attribute which partitions the learning set into subsets as pure as possible Various measures of purity Information-theoretic Gini index X 2 ReliefF Various improvements probability estimates rmalization binarization, subsetting 7
8 Information-Theoretic Approach To classify an object, a certain information is needed I, information After we have learned the value of A, we only need some remaining amount of information to classify the object Ires, residual information Gain Gain(A) = I Ires(A) The most informative attribute is the one that minimizes Ires, ie, maximizes Gain Entropy The average amount of information I needed to classify an object is given by the entropy measure For a two-class problem: entropy p(c1) 8
9 Residual Information After applying attribute A, S is partitioned into subsets according to values v of A Ires is equal to weighted sum of the amounts of information for the subsets Triangles and Squares # Attribute Shape Color Outline Dot 1 green dashed triange 2 green dashed triange 3 yellow dashed square 4 red dashed square 5 red solid square 6 red solid triange 7 green solid square 8 green dashed triange 9 yellow solid square 10 red solid square 11 green solid square 12 yellow dashed square 13 yellow solid square 14 red dashed triange 9
10 Triangles and Squares # Attribute Shape Color Outline Dot 1 green dashed triange 2 green dashed triange 3 yellow dashed square 4 red dashed square 5 red solid square 6 red solid triange 7 green solid square 8 green dashed triange 9 yellow solid square 10 red solid square 11 green solid square 12 yellow dashed square 13 yellow solid square 14 red dashed triange Data Set: A set of classified objects Entropy 5 triangles 9 squares class probabilities entropy 10
11 red Entropy reduction by data set partitioning Color? yellow green Entropija vredsti atributa Color? red yellow green 11
12 Information Gain Color? red yellow green Information Gain of The Attribute Attributes Gain(Color) = 0246 Gain(Outline) = 0151 Gain(Dot) = 0048 Heuristics: attribute with the highest gain is chosen This heuristics is local (local minimization of impurity) 12
13 red Color? green yellow Gain(Outline) = = 0971 bits Gain(Dot) = = 0020 bits red Color? Gain(Outline) = = 0020 bits Gain(Dot) = = 0971 bits green yellow Outline? dashed solid 13
14 14 Color? red yellow green dashed solid Dot? Outline? Decision Tree Color Dot Outline square red yellow green square triangle square triangle dashed solid
15 A Defect of Ires Ires favors attributes with many values Such attribute splits S to many subsets, and if these are small, they will tend to be pure anyway One way to rectify this is through a corrected measure of information gain ratio Information Gain Ratio I(A) is amount of information needed to determine the value of an attribute A Information gain ratio 15
16 Information Gain Ratio Color? red yellow green Information Gain and Information Gain Ratio A v(a) Gain(A) GainRatio(A) Color Outline Dot
17 Gini Index Ather sensible measure of impurity (i and j are classes) After applying attribute A, the resulting Gini index is Gini can be interpreted as expected error rate Gini Index 17
18 Gini Index for Color Color? red yellow green Gain of Gini Index 18
19 Three Impurity Measures A Gain(A) GainRatio(A) GiniGain(A) Color Outline Dot These impurity measures assess the effect of a single attribute Criterion most informative that they define is local (and myopic ) It does t reliably predict the effect of several attributes applied jointly Orange: Shapes Data Set shapetab Color Outline Dot Shape d d d d class green dashed triange green dashed triange yellow dashed square red dashed square red solid square red solid triange green solid square green dashed triange yellow solid square red solid square green solid square yellow dashed square yellow solid square 19
20 Orange: Impurity Measures import orange data = orangeexampletable('shape') gain = orangemeasureattribute_info gainratio = orangemeasureattribute_gainratio gini = orangemeasureattribute_gini print print "%15s %-8s %-8s %-8s" % ("name", "gain", "g ratio", "gini") for attr in datadomainattributes: print "%15s %43f %43f %43f" % \ (attrname, gain(attr, data), gainratio(attr, data), gini(attr, data)) name gain g ratio gini Color Outline Dot Orange: orngtree import orange, orngtree data = orangeexampletable('shape') tree = orngtreetreelearner(data) orngtreeprinttxt(tree) print '\nwith contingency vector:' orngtreeprinttxt(tree, internalnodefields=['contingency'], leaffields=['contingency']) Color green: Outline dashed: triange (1000%) Outline solid: square (1000%) Color yellow: square (1000%) Color red: Dot : square (1000%) Dot : triange (1000%) With contingency vector: Color (<5, 9>) green: Outline (<3, 2>) dashed: triange (<3, 0>) Outline (<3, 2>) solid: square (<0, 2>) Color (<5, 9>) yellow: square (<0, 4>) Color (<5, 9>) red: Dot (<2, 3>) : square (<0, 3>) Dot (<2, 3>) : triange (<2, 0>) 20
21 Orange: Saving to DOT import orange, orngtree data = orangeexampletable('shape') tree = orngtreetreelearner(data) orngtreeprintdot(tree, 'shapedot', leafshape='box', internalnodeshape='ellipse') > dot -Tgif shapedot > shapegif DB Miner: visualization 21
22 SGI MineSet: visualization 22
Induction of Decision Trees
Induction of Decision Trees Blaž Zupan and Ivan Bratko magixfriuni-ljsi/predavanja/uisp An Example Data Set and Decision Tree # Attribute Class Outlook Company Sailboat Sail? 1 sunny big small yes 2 sunny
More informationDecision Trees Dr. G. Bharadwaja Kumar VIT Chennai
Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target
More informationISSUES IN DECISION TREE LEARNING
ISSUES IN DECISION TREE LEARNING Handling Continuous Attributes Other attribute selection measures Overfitting-Pruning Handling of missing values Incremental Induction of Decision Tree 1 DECISION TREE
More informationClassification with Decision Tree Induction
Classification with Decision Tree Induction This algorithm makes Classification Decision for a test sample with the help of tree like structure (Similar to Binary Tree OR k-ary tree) Nodes in the tree
More informationDecision tree learning
Decision tree learning Andrea Passerini passerini@disi.unitn.it Machine Learning Learning the concept Go to lesson OUTLOOK Rain Overcast Sunny TRANSPORTATION LESSON NO Uncovered Covered Theoretical Practical
More informationCOMP 465: Data Mining Classification Basics
Supervised vs. Unsupervised Learning COMP 465: Data Mining Classification Basics Slides Adapted From : Jiawei Han, Micheline Kamber & Jian Pei Data Mining: Concepts and Techniques, 3 rd ed. Supervised
More informationData Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier
Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio
More informationLecture 5: Decision Trees (Part II)
Lecture 5: Decision Trees (Part II) Dealing with noise in the data Overfitting Pruning Dealing with missing attribute values Dealing with attributes with multiple values Integrating costs into node choice
More informationExtra readings beyond the lecture slides are important:
1 Notes To preview next lecture: Check the lecture notes, if slides are not available: http://web.cse.ohio-state.edu/~sun.397/courses/au2017/cse5243-new.html Check UIUC course on the same topic. All their
More informationData Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3
Data Mining: Concepts and Techniques Classification and Prediction Chapter 6.1-3 January 25, 2007 CSE-4412: Data Mining 1 Chapter 6 Classification and Prediction 1. What is classification? What is prediction?
More informationBITS F464: MACHINE LEARNING
BITS F464: MACHINE LEARNING Lecture-16: Decision Tree (contd.) + Random Forest Dr. Kamlesh Tiwari Assistant Professor Department of Computer Science and Information Systems Engineering, BITS Pilani, Rajasthan-333031
More informationData Mining and Analytics
Data Mining and Analytics Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/22/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/bsbt6111/
More information9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form)
Comp 135 Introduction to Machine Learning and Data Mining Our first learning algorithm How would you classify the next example? Fall 2014 Professor: Roni Khardon Computer Science Tufts University o o o
More informationNominal Data. May not have a numerical representation Distance measures might not make sense. PR and ANN
NonMetric Data Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical
More informationLecture 5 of 42. Decision Trees, Occam s Razor, and Overfitting
Lecture 5 of 42 Decision Trees, Occam s Razor, and Overfitting Friday, 01 February 2008 William H. Hsu, KSU http://www.cis.ksu.edu/~bhsu Readings: Chapter 3.6-3.8, Mitchell Lecture Outline Read Sections
More informationClassification. Instructor: Wei Ding
Classification Decision Tree Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Preliminaries Each data record is characterized by a tuple (x, y), where x is the attribute
More informationIntroduction to Machine Learning
Introduction to Machine Learning Decision Tree Example Three variables: Attribute 1: Hair = {blond, dark} Attribute 2: Height = {tall, short} Class: Country = {Gromland, Polvia} CS4375 --- Fall 2018 a
More informationData Mining Algorithms: Basic Methods
Algorithms: The basic methods Inferring rudimentary rules Data Mining Algorithms: Basic Methods Chapter 4 of Data Mining Statistical modeling Constructing decision trees Constructing rules Association
More informationPart I. Instructor: Wei Ding
Classification Part I Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Classification: Definition Given a collection of records (training set ) Each record contains a set
More informationDecision Tree Learning
Decision Tree Learning 1 Simple example of object classification Instances Size Color Shape C(x) x1 small red circle positive x2 large red circle positive x3 small red triangle negative x4 large blue circle
More informationImplementierungstechniken für Hauptspeicherdatenbanksysteme Classification: Decision Trees
Implementierungstechniken für Hauptspeicherdatenbanksysteme Classification: Decision Trees Dominik Vinan February 6, 2018 Abstract Decision Trees are a well-known part of most modern Machine Learning toolboxes.
More informationClassification: Basic Concepts, Decision Trees, and Model Evaluation
Classification: Basic Concepts, Decision Trees, and Model Evaluation Data Warehousing and Mining Lecture 4 by Hossen Asiful Mustafa Classification: Definition Given a collection of records (training set
More informationCSE4334/5334 DATA MINING
CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy
More informationNominal Data. May not have a numerical representation Distance measures might not make sense PR, ANN, & ML
Decision Trees Nominal Data So far we consider patterns to be represented by feature vectors of real or integer values Easy to come up with a distance (similarity) measure by using a variety of mathematical
More information8. Tree-based approaches
Foundations of Machine Learning École Centrale Paris Fall 2015 8. Tree-based approaches Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationCS Machine Learning
CS 60050 Machine Learning Decision Tree Classifier Slides taken from course materials of Tan, Steinbach, Kumar 10 10 Illustrating Classification Task Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K
More informationLecture outline. Decision-tree classification
Lecture outline Decision-tree classification Decision Trees Decision tree A flow-chart-like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes
More informationDecision Tree Learning
Decision Tree Learning Debapriyo Majumdar Data Mining Fall 2014 Indian Statistical Institute Kolkata August 25, 2014 Example: Age, Income and Owning a flat Monthly income (thousand rupees) 250 200 150
More informationData Mining Practical Machine Learning Tools and Techniques
Decision trees Extending previous approach: Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank to permit numeric s: straightforward
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationDecision Tree CE-717 : Machine Learning Sharif University of Technology
Decision Tree CE-717 : Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Some slides have been adapted from: Prof. Tom Mitchell Decision tree Approximating functions of usually discrete
More informationData Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Decision Tree Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 24 Table of contents 1 Introduction 2 Decision tree
More informationSupervised Learning. Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression...
Supervised Learning Decision trees Artificial neural nets K-nearest neighbor Support vectors Linear regression Logistic regression... Supervised Learning y=f(x): true function (usually not known) D: training
More informationAdvanced learning algorithms
Advanced learning algorithms Extending decision trees; Extraction of good classification rules; Support vector machines; Weighted instance-based learning; Design of Model Tree Clustering Association Mining
More informationMachine Learning in Real World: C4.5
Machine Learning in Real World: C4.5 Industrial-strength algorithms For an algorithm to be useful in a wide range of realworld applications it must: Permit numeric attributes with adaptive discretization
More information7. Decision or classification trees
7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,
More informationCLASSIFICATION OF C4.5 AND CART ALGORITHMS USING DECISION TREE METHOD
CLASSIFICATION OF C4.5 AND CART ALGORITHMS USING DECISION TREE METHOD Khin Lay Myint 1, Aye Aye Cho 2, Aye Mon Win 3 1 Lecturer, Faculty of Information Science, University of Computer Studies, Hinthada,
More informationChapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning
Chapter ML:III III. Decision Trees Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning ML:III-67 Decision Trees STEIN/LETTMANN 2005-2017 ID3 Algorithm [Quinlan 1986]
More informationOutline. RainForest A Framework for Fast Decision Tree Construction of Large Datasets. Introduction. Introduction. Introduction (cont d)
Outline RainForest A Framework for Fast Decision Tree Construction of Large Datasets resented by: ov. 25, 2004 1. 2. roblem Definition 3. 4. Family of Algorithms 5. 6. 2 Classification is an important
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees
University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees colour=green? size>20cm? colour=red? watermelon size>5cm? size>5cm? colour=yellow? apple
More informationMIT 801. Machine Learning I. [Presented by Anna Bosman] 16 February 2018
MIT 801 [Presented by Anna Bosman] 16 February 2018 Machine Learning What is machine learning? Artificial Intelligence? Yes as we know it. What is intelligence? The ability to acquire and apply knowledge
More informationIntroduction to Machine Learning CANB 7640
Introduction to Machine Learning CANB 7640 Aik Choon Tan, Ph.D. Associate Professor of Bioinformatics Division of Medical Oncology Department of Medicine aikchoon.tan@ucdenver.edu 9/5/2017 http://tanlab.ucdenver.edu/labhomepage/teaching/canb7640/
More informationDecision Trees. Query Selection
CART Query Selection Key Question: Given a partial tree down to node N, what feature s should we choose for the property test T? The obvious heuristic is to choose the feature that yields as big a decrease
More informationClassification: Decision Trees
Metodologie per Sistemi Intelligenti Classification: Decision Trees Prof. Pier Luca Lanzi Laurea in Ingegneria Informatica Politecnico di Milano Polo regionale di Como Lecture outline What is a decision
More informationUniversity of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees
University of Cambridge Engineering Part IIB Paper 4F10: Statistical Pattern Processing Handout 10: Decision Trees colour=green? size>20cm? colour=red? watermelon size>5cm? size>5cm? colour=yellow? apple
More informationData Mining in Bioinformatics Day 1: Classification
Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls
More informationInstance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.
Instance-Based Representations exemplars + distance measure Challenges. algorithm: IB1 classify based on majority class of k nearest neighbors learned structure is not explicitly represented choosing k
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization
More informationRule induction. Dr Beatriz de la Iglesia
Rule induction Dr Beatriz de la Iglesia email: b.iglesia@uea.ac.uk Outline What are rules? Rule Evaluation Classification rules Association rules 2 Rule induction (RI) As their name suggests, RI algorithms
More informationMachine Learning Chapter 2. Input
Machine Learning Chapter 2. Input 2 Input: Concepts, instances, attributes Terminology What s a concept? Classification, association, clustering, numeric prediction What s in an example? Relations, flat
More informationData Mining Classification - Part 1 -
Data Mining Classification - Part 1 - Universität Mannheim Bizer: Data Mining I FSS2019 (Version: 20.2.2018) Slide 1 Outline 1. What is Classification? 2. K-Nearest-Neighbors 3. Decision Trees 4. Model
More informationCONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM
1 CONCEPT FORMATION AND DECISION TREE INDUCTION USING THE GENETIC PROGRAMMING PARADIGM John R. Koza Computer Science Department Stanford University Stanford, California 94305 USA E-MAIL: Koza@Sunburn.Stanford.Edu
More informationRepresenting structural patterns: Reading Material: Chapter 3 of the textbook by Witten
Representing structural patterns: Plain Classification rules Decision Tree Rules with exceptions Relational solution Tree for Numerical Prediction Instance-based presentation Reading Material: Chapter
More informationMachine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme
Machine Learning A. Supervised Learning A.7. Decision Trees Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /
More informationCMPUT 391 Database Management Systems. Data Mining. Textbook: Chapter (without 17.10)
CMPUT 391 Database Management Systems Data Mining Textbook: Chapter 17.7-17.11 (without 17.10) University of Alberta 1 Overview Motivation KDD and Data Mining Association Rules Clustering Classification
More informationDESIGN AND IMPLEMENTATION OF BUILDING DECISION TREE USING C4.5 ALGORITHM
1 Proceedings of SEAMS-GMU Conference 2007 DESIGN AND IMPLEMENTATION OF BUILDING DECISION TREE USING C4.5 ALGORITHM KUSRINI Abstract. Decision tree is one of data mining techniques that is applied in classification
More informationCoefficient of Variation based Decision Tree (CvDT)
Coefficient of Variation based Decision Tree (CvDT) Hima Bindu K #1, Swarupa Rani K #2, Raghavendra Rao C #3 # Department of Computer and Information Sciences, University of Hyderabad Hyderabad, 500046,
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 10 - Classification trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey
More informationData Mining Part 4. Tony C Smith WEKA Machine Learning Group Department of Computer Science University of Waikato
Data Mining Part 4 Tony C Smith WEKA Machine Learning Group Department of Computer Science University of Waikato Algorithms: The basic methods Inferring rudimentary rules Statistical modeling Constructing
More informationClassification and Regression Trees
Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationDecision Trees. Petr Pošík. Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics
Decision Trees Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Dept. of Cybernetics This lecture is largely based on the book Artificial Intelligence: A Modern Approach,
More informationClassification/Regression Trees and Random Forests
Classification/Regression Trees and Random Forests Fabio G. Cozman - fgcozman@usp.br November 6, 2018 Classification tree Consider binary class variable Y and features X 1,..., X n. Decide Ŷ after a series
More informationData Mining Practical Machine Learning Tools and Techniques
Input: Concepts, instances, attributes Data ining Practical achine Learning Tools and Techniques Slides for Chapter 2 of Data ining by I. H. Witten and E. rank Terminology What s a concept z Classification,
More informationTillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen , , MA:8. 1 Search (JM): 11 points
Lunds Tekniska Högskola EDA132 Institutionen för datavetenskap VT 2017 Tillämpad Artificiell Intelligens Applied Artificial Intelligence Tentamen 2016 03 15, 14.00 19.00, MA:8 You can give your answers
More informationLecture 2 :: Decision Trees Learning
Lecture 2 :: Decision Trees Learning 1 / 62 Designing a learning system What to learn? Learning setting. Learning mechanism. Evaluation. 2 / 62 Prediction task Figure 1: Prediction task :: Supervised learning
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationUnsupervised: no target value to predict
Clustering Unsupervised: no target value to predict Differences between models/algorithms: Exclusive vs. overlapping Deterministic vs. probabilistic Hierarchical vs. flat Incremental vs. batch learning
More informationData Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Input: Concepts, instances, attributes Terminology What s a concept?
More informationData Mining and Machine Learning: Techniques and Algorithms
Instance based classification Data Mining and Machine Learning: Techniques and Algorithms Eneldo Loza Mencía eneldo@ke.tu-darmstadt.de Knowledge Engineering Group, TU Darmstadt International Week 2019,
More informationNesnelerin İnternetinde Veri Analizi
Nesnelerin İnternetinde Veri Analizi Bölüm 3. Classification in Data Streams w3.gazi.edu.tr/~suatozdemir Supervised vs. Unsupervised Learning (1) Supervised learning (classification) Supervision: The training
More informationMachine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU
Machine Learning 10-701/15-781, Spring 2008 Decision Trees Le Song Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU Reading: Chap. 1.6, CB & Chap 3, TM Learning non-linear functions f:
More information.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. for each element of the dataset we are given its class label.
.. Cal Poly CSC 466: Knowledge Discovery from Data Alexander Dekhtyar.. Data Mining: Classification/Supervised Learning Definitions Data. Consider a set A = {A 1,...,A n } of attributes, and an additional
More informationClassification and Regression
Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan
More informationPart I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes
Week 4 Based in part on slides from textbook, slides of Susan Holmes Part I Classification & Decision Trees October 19, 2012 1 / 1 2 / 1 Classification Classification Problem description We are given a
More informationClassification and Regression Trees
Classification and Regression Trees Matthew S. Shotwell, Ph.D. Department of Biostatistics Vanderbilt University School of Medicine Nashville, TN, USA March 16, 2018 Introduction trees partition feature
More informationCS229 Lecture notes. Raphael John Lamarre Townshend
CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based
More informationData Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Output: Knowledge representation Tables Linear models Trees Rules
More informationExample of DT Apply Model Example Learn Model Hunt s Alg. Measures of Node Impurity DT Examples and Characteristics. Classification.
lassification-decision Trees, Slide 1/56 Classification Decision Trees Huiping Cao lassification-decision Trees, Slide 2/56 Examples of a Decision Tree Tid Refund Marital Status Taxable Income Cheat 1
More informationPractical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer
Practical Data Mining COMP-321B Tutorial 1: Introduction to the WEKA Explorer Gabi Schmidberger Mark Hall Richard Kirkby July 12, 2006 c 2006 University of Waikato 1 Setting up your Environment Before
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More informationLecture 7: Decision Trees
Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Features and Patterns The Curse of Size and
More informationInducing Decision Trees with an Ant Colony Optimization Algorithm
To Appear in Applied Soft Computing (2012), DOI: 10.1016/j.asoc.2012.05.028 1 Inducing Decision Trees with an Ant Colony Optimization Algorithm Fernando E. B. Otero, Alex A. Freitas, Colin G. Johnson School
More informationBasic Concepts Weka Workbench and its terminology
Changelog: 14 Oct, 30 Oct Basic Concepts Weka Workbench and its terminology Lecture Part Outline Concepts, instances, attributes How to prepare the input: ARFF, attributes, missing values, getting to know
More informationNotes based on: Data Mining for Business Intelligence
Chapter 9 Classification and Regression Trees Roger Bohn April 2017 Notes based on: Data Mining for Business Intelligence 1 Shmueli, Patel & Bruce 2 3 II. Results and Interpretation There are 1183 auction
More informationData Mining D E C I S I O N T R E E. Matteo Golfarelli
Data Mining D E C I S I O N T R E E Matteo Golfarelli Decision Tree It is one of the most widely used classification techniques that allows you to represent a set of classification rules with a tree. Tree:
More informationLazy Decision Trees Ronny Kohavi
Lazy Decision Trees Ronny Kohavi Data Mining and Visualization Group Silicon Graphics, Inc. Joint work with Jerry Friedman and Yeogirl Yun Stanford University Motivation: Average Impurity = / interesting
More informationChapter 4: Algorithms CS 795
Chapter 4: Algorithms CS 795 Inferring Rudimentary Rules 1R Single rule one level decision tree Pick each attribute and form a single level tree without overfitting and with minimal branches Pick that
More informationWhat Is Data Mining? CMPT 354: Database I -- Data Mining 2
Data Mining What Is Data Mining? Mining data mining knowledge Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data CMPT
More informationThe digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).
http://waikato.researchgateway.ac.nz/ Research Commons at the University of Waikato Copyright Statement: The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). The thesis
More informationHomework 1 Sample Solution
Homework 1 Sample Solution 1. Iris: All attributes of iris are numeric, therefore ID3 of weka cannt be applied to this data set. Contact-lenses: tear-prod-rate = reduced: none tear-prod-rate = normal astigmatism
More informationStatistical Pattern Recognition
Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and
More informationCSE 5243 INTRO. TO DATA MINING
CSE 5243 INTRO. TO DATA MINING Classification (Basic Concepts) Huan Sun, CSE@The Ohio State University 09/12/2017 Slides adapted from UIUC CS412, Fall 2017, by Prof. Jiawei Han Classification: Basic Concepts
More informationData Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining
Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation Lecture Notes for Chapter 4 Introduction to Data Mining by Tan, Steinbach, Kumar (modified by Predrag Radivojac, 2017) Classification:
More informationNonparametric Classification Methods
Nonparametric Classification Methods We now examine some modern, computationally intensive methods for regression and classification. Recall that the LDA approach constructs a line (or plane or hyperplane)
More informationLecture 19: Decision trees
Lecture 19: Decision trees Reading: Section 8.1 STATS 202: Data mining and analysis November 10, 2017 1 / 17 Decision trees, 10,000 foot view R2 R5 t4 1. Find a partition of the space of predictors. X2
More informationARTIFICIAL INTELLIGENCE (CS 370D)
Princess Nora University Faculty of Computer & Information Systems ARTIFICIAL INTELLIGENCE (CS 370D) (CHAPTER-18) LEARNING FROM EXAMPLES DECISION TREES Outline 1- Introduction 2- know your data 3- Classification
More informationMachine Learning Lecture 11
Machine Learning Lecture 11 Random Forests 23.11.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory Probability
More informationImplementation of Classification Rules using Oracle PL/SQL
1 Implementation of Classification Rules using Oracle PL/SQL David Taniar 1 Gillian D cruz 1 J. Wenny Rahayu 2 1 School of Business Systems, Monash University, Australia Email: David.Taniar@infotech.monash.edu.au
More information