Fitting Classification and Regression Trees Using Statgraphics and R. Presented by Dr. Neil W. Polhemus
|
|
- Ellen Palmer
- 5 years ago
- Views:
Transcription
1 Fitting Classification and Regression Trees Using Statgraphics and R Presented by Dr. Neil W. Polhemus
2 Classification and Regression Trees Machine learning methods used to construct predictive models from data. Recursively partitions the data space using simple binary decisions. Commonly portrayed as a tree with a split at each decision node.
3 Example Fisher s Iris Data Species Petal.length<2.45 setosa (p=1.0) Petal.width<1.75 Petal.length<4.95 Petal.length<4.95 Sepal.length<5.15 virginica (p= ) virginica (p= ) virginica (p=1.0) versicolor (p=0.8) versicolor (p=1.0)
4 Basic Model Structure Y: variable to be predicted If categorical, we construct a classification tree. If continuous, we construct a regression tree. X 1, X 2, X p : predictor variables May be either categorical or continuous.
5 Start at the root node. Partitioning Algorithm Amongst all variables X j, find the split that minimizes the resulting average within-node deviance. For a continuous variable, the split is of the form X j < c. For a discrete variable, the split divides the possible values into 2 distinct groups. If one or more stopping criteria are met after the split, stop. Otherwise consider splitting the child nodes.
6 RMS Titanic
7 Sample Data File n=1,309 observations (passengers only) Source: Frank Harrell and Thomas Cason, University of Virginia
8 Data Input
9 Analysis Options
10 Partitioning Options
11 de Impurity Measure the impurity in a tree using the residual mean deviance (RMD). n = number of observations in training set k = number of leaves p i,j = proportion of data of same type as i at its assigned leaf j Y i = predicted value of observation i Classification trees Regression trees
12 Analysis Window
13 Decision Tree survived sex=female pclass=3 age<9.5 fare< sibsp=3,4,5 pclass=2,3
14 Decision Tree Options
15 Tree Structure * te: based on complete cases only.
16 de Probabilities * te: based on complete cases only.
17 Classification Table * te: based on both complete and partial cases.
18 Training and Validation Sets May separate the data into 2 sets: Training set used to build the tree. Test set used to estimate the tree misclassification percentages.
19 Compare Results
20 Pruning Options Reduces the complexity of the tree by removing branches.
21 Pruning by Cross-validation Runs a 10-fold cross-validation experiment. Builds 10 trees, leaving out 10% of the data each time, and averages the results. Uses all of the observations for both training and validation. Can be used to determine the optimal size for the tree by increasing the number of leaves until you see warnings such as:
22 Pruning Example First reduce within-node deviance to fit a complex tree.
23 Pruning Example surv iv ed sex=fem pclass=3 age<9.5 fare< fare< sibsp=0,1,2 pclass=1 embarked=q,sfare< parch=2,3 fare< age<3.5 fare< age<32.25 age<54.5 age<27.5 fare< age<17.5 fare< sibsp=1,2,3 age<29.5 fare< sibsp=0,2,3 sibsp=1 fare< sibsp=0,1,3 fare< fare< fare<86.35 age<31.5 fare< age<43.0 fare< fare<29.85 fare< parch=0 fare<15.7 fare< age<42.5 embarked=q age<31.5 fare< fare< fare< age<21.5 fare< fare<22.0 age<25.0 fare<13.25 age<37.0 embarked=q,s age<36.25 fare< age<42.5 fare< fare< fare< age<39.5 age<36.5 fare< age<33.5 fare< age<20.75 fare< pclass=3 fare<13.25 age<33.5 fare< fare< fare< age<25.75 fare<11.0 age<46.0 fare< fare< fare< age<23.5 age<25.5 age<21.5 age<59.0 fare< fare< age<32.5 fare< age<27.5 age<19.5 parch=1 fare< fare< fare< age<21.5
24 Pruning Example w reduce the number of leaves to 10 and select cross-validation. surv iv ed sex=female pclass=3 age<9.5 fare< fare< sibsp=3,4,5 pclass=2,3 age<32.25 age<54.5
25 10 Leaves is Too Complex
26 Finding Optimal Number of Leaves Step 1: Copy script from Statgraphics to Word and modify last line.
27 Finding Optimal Number of Leaves Step 2: Copy modified script to R and run it. Look for size with minimum deviance.
28 Prune Tree
29 Final Tree survived sex=female pclass=3 age<9.5 sibsp=3,4,5
30 Predict Additional Cases To make predictions for additional cases, add them to the bottom of the original data, leaving the cell for Y blank.
31 Predictions and Residuals
32 Example 2: World Bank Demographics
33 Decision Tree Life Expectancy Fertility.Rate<3.3 GDP.per.Capita< Fertility.Rate<4.585 GDP.per.Capita< Fem ale.percentage<49.97 Pop..Density< Pop..Density<40.37 Age.Dependency.Ratio<72.925
34 References StatFolios and data files are at: R Package tree (2015) Classic text: Brieman, L., Friedman, J., Stone, C.J. and Olshen, R.A. (1998) Classification and Regression Trees. Wadsworth.
Lecture 20: Classification and Regression Trees
Fall, 2017 Outline Basic Ideas Basic Ideas Tree Construction Algorithm Parameter Tuning Choice of Impurity Measure Missing Values Characteristics of Classification Trees Main Characteristics: very flexible,
More informationClassification and Regression Trees
Classification and Regression Trees Matthew S. Shotwell, Ph.D. Department of Biostatistics Vanderbilt University School of Medicine Nashville, TN, USA March 16, 2018 Introduction trees partition feature
More informationClassification: Decision Trees
Classification: Decision Trees IST557 Data Mining: Techniques and Applications Jessie Li, Penn State University 1 Decision Tree Example Will a pa)ent have high-risk based on the ini)al 24-hour observa)on?
More informationLars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany
Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization
More informationMachine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme
Machine Learning A. Supervised Learning A.7. Decision Trees Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /
More informationA Systematic Overview of Data Mining Algorithms. Sargur Srihari University at Buffalo The State University of New York
A Systematic Overview of Data Mining Algorithms Sargur Srihari University at Buffalo The State University of New York 1 Topics Data Mining Algorithm Definition Example of CART Classification Iris, Wine
More informationIntroduction to R and Statistical Data Analysis
Microarray Center Introduction to R and Statistical Data Analysis PART II Petr Nazarov petr.nazarov@crp-sante.lu 22-11-2010 OUTLINE PART II Descriptive statistics in R (8) sum, mean, median, sd, var, cor,
More informationA toolkit for stability assessment of tree-based learners
A toolkit for stability assessment of tree-based learners Michel Philipp, University of Zurich, Michel.Philipp@psychologie.uzh.ch Achim Zeileis, Universität Innsbruck, Achim.Zeileis@R-project.org Carolin
More informationRandom Forest A. Fornaser
Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University
More informationMachine Learning: Algorithms and Applications Mockup Examination
Machine Learning: Algorithms and Applications Mockup Examination 14 May 2012 FIRST NAME STUDENT NUMBER LAST NAME SIGNATURE Instructions for students Write First Name, Last Name, Student Number and Signature
More informationk Nearest Neighbors Super simple idea! Instance-based learning as opposed to model-based (no pre-processing)
k Nearest Neighbors k Nearest Neighbors To classify an observation: Look at the labels of some number, say k, of neighboring observations. The observation is then classified based on its nearest neighbors
More informationPerformance Analysis of Data Mining Classification Techniques
Performance Analysis of Data Mining Classification Techniques Tejas Mehta 1, Dr. Dhaval Kathiriya 2 Ph.D. Student, School of Computer Science, Dr. Babasaheb Ambedkar Open University, Gujarat, India 1 Principal
More informationIntroduction to Classification & Regression Trees
Introduction to Classification & Regression Trees ISLR Chapter 8 vember 8, 2017 Classification and Regression Trees Carseat data from ISLR package Classification and Regression Trees Carseat data from
More informationBiology Project 1
Biology 6317 Project 1 Data and illustrations courtesy of Professor Tony Frankino, Department of Biology/Biochemistry 1. Background The data set www.math.uh.edu/~charles/wing_xy.dat has measurements related
More informationDecision trees. Decision trees are useful to a large degree because of their simplicity and interpretability
Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful
More informationData Mining. Decision Tree. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Decision Tree Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 24 Table of contents 1 Introduction 2 Decision tree
More informationSupervised Learning Classification Algorithms Comparison
Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------
More informationPractical Data Mining COMP-321B. Tutorial 1: Introduction to the WEKA Explorer
Practical Data Mining COMP-321B Tutorial 1: Introduction to the WEKA Explorer Gabi Schmidberger Mark Hall Richard Kirkby July 12, 2006 c 2006 University of Waikato 1 Setting up your Environment Before
More informationLecture 19: Decision trees
Lecture 19: Decision trees Reading: Section 8.1 STATS 202: Data mining and analysis November 10, 2017 1 / 17 Decision trees, 10,000 foot view R2 R5 t4 1. Find a partition of the space of predictors. X2
More informationChapter 5. Tree-based Methods
Chapter 5. Tree-based Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Regression
More informationTree-based methods for classification and regression
Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting
More informationClassification and Regression Trees
Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression
More informationFigure 3.20: Visualize the Titanic Dataset
80 Chapter 3. Data Mining with Azure Machine Learning Studio Figure 3.20: Visualize the Titanic Dataset 3. After verifying the output, we will cast categorical values to the corresponding columns. To begin,
More informationVisualizing class probability estimators
Visualizing class probability estimators Eibe Frank and Mark Hall Department of Computer Science University of Waikato Hamilton, New Zealand {eibe, mhall}@cs.waikato.ac.nz Abstract. Inducing classifiers
More informationarulescba: Classification for Factor and Transactional Data Sets Using Association Rules
arulescba: Classification for Factor and Transactional Data Sets Using Association Rules Ian Johnson Southern Methodist University Abstract This paper presents an R package, arulescba, which uses association
More information8. Tree-based approaches
Foundations of Machine Learning École Centrale Paris Fall 2015 8. Tree-based approaches Chloé-Agathe Azencott Centre for Computational Biology, Mines ParisTech chloe agathe.azencott@mines paristech.fr
More informationPatrick Breheny. November 10
Patrick Breheny November Patrick Breheny BST 764: Applied Statistical Modeling /6 Introduction Our discussion of tree-based methods in the previous section assumed that the outcome was continuous Tree-based
More informationDecision Trees Dr. G. Bharadwaja Kumar VIT Chennai
Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target
More informationData Science with R. Decision Trees.
http: // togaware. com Copyright 2014, Graham.Williams@togaware.com 1/46 Data Science with R Decision Trees Graham.Williams@togaware.com Data Scientist Australian Taxation O Adjunct Professor, Australian
More informationLecture 2 :: Decision Trees Learning
Lecture 2 :: Decision Trees Learning 1 / 62 Designing a learning system What to learn? Learning setting. Learning mechanism. Evaluation. 2 / 62 Prediction task Figure 1: Prediction task :: Supervised learning
More informationHands on Datamining & Machine Learning with Weka
Step1: Click the Experimenter button to launch the Weka Experimenter. The Weka Experimenter allows you to design your own experiments of running algorithms on datasets, run the experiments and analyze
More informationPart I. Classification & Decision Trees. Classification. Classification. Week 4 Based in part on slides from textbook, slides of Susan Holmes
Week 4 Based in part on slides from textbook, slides of Susan Holmes Part I Classification & Decision Trees October 19, 2012 1 / 1 2 / 1 Classification Classification Problem description We are given a
More informationFuzzy Partitioning with FID3.1
Fuzzy Partitioning with FID3.1 Cezary Z. Janikow Dept. of Mathematics and Computer Science University of Missouri St. Louis St. Louis, Missouri 63121 janikow@umsl.edu Maciej Fajfer Institute of Computing
More informationIntroduction to Machine Learning
Introduction to Machine Learning Eric Medvet 16/3/2017 1/77 Outline Machine Learning: what and why? Motivating example Tree-based methods Regression trees Trees aggregation 2/77 Teachers Eric Medvet Dipartimento
More informationDecision Trees In Weka,Data Formats
CS 4510/9010 Applied Machine Learning 1 Decision Trees In Weka,Data Formats Paula Matuszek Fall, 2016 J48: Decision Tree in Weka 2 NAME: weka.classifiers.trees.j48 SYNOPSIS Class for generating a pruned
More informationData analysis case study using R for readily available data set using any one machine learning Algorithm
Assignment-4 Data analysis case study using R for readily available data set using any one machine learning Algorithm Broadly, there are 3 types of Machine Learning Algorithms.. 1. Supervised Learning
More informationModel Selection Introduction to Machine Learning. Matt Gormley Lecture 4 January 29, 2018
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Model Selection Matt Gormley Lecture 4 January 29, 2018 1 Q&A Q: How do we deal
More informationClassification with Decision Tree Induction
Classification with Decision Tree Induction This algorithm makes Classification Decision for a test sample with the help of tree like structure (Similar to Binary Tree OR k-ary tree) Nodes in the tree
More informationOblique Linear Tree. 1. Introduction
Oblique Linear Tree João Gama LIACC, FEP - University of Porto Rua Campo Alegre, 823 4150 Porto, Portugal Phone: (+351) 2 6001672 Fax: (+351) 2 6003654 Email: jgama@ncc.up.pt WWW: http//www.up.pt/liacc/ml
More informationk-nearest Neighbors + Model Selection
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University k-nearest Neighbors + Model Selection Matt Gormley Lecture 5 Jan. 30, 2019 1 Reminders
More informationClassification and Regression
Classification and Regression Announcements Study guide for exam is on the LMS Sample exam will be posted by Monday Reminder that phase 3 oral presentations are being held next week during workshops Plan
More informationUnivariate and Multivariate Decision Trees
Univariate and Multivariate Decision Trees Olcay Taner Yıldız and Ethem Alpaydın Department of Computer Engineering Boğaziçi University İstanbul 80815 Turkey Abstract. Univariate decision trees at each
More informationMULTIVARIATE ANALYSIS USING R
MULTIVARIATE ANALYSIS USING R B N Mandal I.A.S.R.I., Library Avenue, New Delhi 110 012 bnmandal @iasri.res.in 1. Introduction This article gives an exposition of how to use the R statistical software for
More informationInstance-Based Representations. k-nearest Neighbor. k-nearest Neighbor. k-nearest Neighbor. exemplars + distance measure. Challenges.
Instance-Based Representations exemplars + distance measure Challenges. algorithm: IB1 classify based on majority class of k nearest neighbors learned structure is not explicitly represented choosing k
More informationData Mining. Practical Machine Learning Tools and Techniques. Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A.
Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 3 of Data Mining by I. H. Witten, E. Frank and M. A. Hall Output: Knowledge representation Tables Linear models Trees Rules
More informationHybrid Feature Selection for Modeling Intrusion Detection Systems
Hybrid Feature Selection for Modeling Intrusion Detection Systems Srilatha Chebrolu, Ajith Abraham and Johnson P Thomas Department of Computer Science, Oklahoma State University, USA ajith.abraham@ieee.org,
More informationarxiv: v1 [stat.ml] 25 Jan 2018
arxiv:1801.08310v1 [stat.ml] 25 Jan 2018 Information gain ratio correction: Improving prediction with more balanced decision tree splits Antonin Leroux 1, Matthieu Boussard 1, and Remi Dès 1 1 craft ai
More informationLecture 7: Decision Trees
Lecture 7: Decision Trees Instructor: Outline 1 Geometric Perspective of Classification 2 Decision Trees Geometric Perspective of Classification Perspective of Classification Algorithmic Geometric Probabilistic...
More informationData Mining Concepts & Techniques
Data Mining Concepts & Techniques Lecture No. 03 Data Processing, Data Mining Naeem Ahmed Email: naeemmahoto@gmail.com Department of Software Engineering Mehran Univeristy of Engineering and Technology
More informationDecision tree learning
Decision tree learning Andrea Passerini passerini@disi.unitn.it Machine Learning Learning the concept Go to lesson OUTLOOK Rain Overcast Sunny TRANSPORTATION LESSON NO Uncovered Covered Theoretical Practical
More informationData Warehousing and Machine Learning
Data Warehousing and Machine Learning Introduction Thomas D. Nielsen Aalborg University Department of Computer Science Spring 2008 DWML Spring 2008 1 / 47 What is Data Mining?? Introduction DWML Spring
More informationFunction Approximation and Feature Selection Tool
Function Approximation and Feature Selection Tool Version: 1.0 The current version provides facility for adaptive feature selection and prediction using flexible neural tree. Developers: Varun Kumar Ojha
More informationNetwork. Department of Statistics. University of California, Berkeley. January, Abstract
Parallelizing CART Using a Workstation Network Phil Spector Leo Breiman Department of Statistics University of California, Berkeley January, 1995 Abstract The CART (Classication and Regression Trees) program,
More informationData Mining Practical Machine Learning Tools and Techniques
Output: Knowledge representation Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter of Data Mining by I. H. Witten and E. Frank Decision tables Decision trees Decision rules
More informationSTAT 1291: Data Science
STAT 1291: Data Science Lecture 18 - Statistical modeling II: Machine learning Sungkyu Jung Where are we? data visualization data wrangling professional ethics statistical foundation Statistical modeling:
More informationOrange3-Prototypes Documentation. Biolab, University of Ljubljana
Biolab, University of Ljubljana Dec 17, 2018 Contents 1 Widgets 1 2 Indices and tables 11 i ii CHAPTER 1 Widgets 1.1 Contingency Table Construct a contingency table from given data. Inputs Data input
More informationComparing Univariate and Multivariate Decision Trees *
Comparing Univariate and Multivariate Decision Trees * Olcay Taner Yıldız, Ethem Alpaydın Department of Computer Engineering Boğaziçi University, 80815 İstanbul Turkey yildizol@cmpe.boun.edu.tr, alpaydin@boun.edu.tr
More informationAn overview for regression tree
An overview for regression tree Abstract PhD (C.) Adem Meta University Ismail Qemali Vlore, Albania Classification and regression tree is a non-parametric methodology. CART is a methodology that divides
More informationIntroduction to Artificial Intelligence
Introduction to Artificial Intelligence COMP307 Machine Learning 2: 3-K Techniques Yi Mei yi.mei@ecs.vuw.ac.nz 1 Outline K-Nearest Neighbour method Classification (Supervised learning) Basic NN (1-NN)
More informationVisualisation of Regression Trees
Visualisation of Regression Trees Chris Brunsdon April 8, 2007 The regression tree [1] has been used as a tool for exploring multivariate data sets for some time. As in multiple linear regression, the
More informationCSE4334/5334 DATA MINING
CSE4334/5334 DATA MINING Lecture 4: Classification (1) CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai Li (Slides courtesy
More informationFrom Building Better Models with JMP Pro. Full book available for purchase here.
From Building Better Models with JMP Pro. Full book available for purchase here. Contents Acknowledgments... ix About This Book... xi About These Authors... xiii Part 1 Introduction... 1 Chapter 1 Introduction...
More informationCART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology
CART Classification and Regression Trees Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology CART CART stands for Classification And Regression Trees.
More informationTechnical Note Using Model Trees for Classification
c Machine Learning,, 1 14 () Kluwer Academic Publishers, Boston. Manufactured in The Netherlands. Technical Note Using Model Trees for Classification EIBE FRANK eibe@cs.waikato.ac.nz YONG WANG yongwang@cs.waikato.ac.nz
More informationDATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS
DATA MINING INTRODUCTION TO CLASSIFICATION USING LINEAR CLASSIFIERS 1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes and a class attribute
More informationData Mining Tools. Jean-Gabriel Ganascia LIP6 University Pierre et Marie Curie 4, place Jussieu, Paris, Cedex 05
Data Mining Tools Jean-Gabriel Ganascia LIP6 University Pierre et Marie Curie 4, place Jussieu, 75252 Paris, Cedex 05 Jean-Gabriel.Ganascia@lip6.fr DATA BASES Data mining Extraction Data mining Interpretation/
More informationStat 342 Exam 3 Fall 2014
Stat 34 Exam 3 Fall 04 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed There are questions on the following 6 pages. Do as many of them as you can
More informationTITANIC. Predicting Survival Using Classification Algorithms
TITANIC Predicting Survival Using Classification Algorithms 1 Nicholas King IE 5300-001 May 2016 PROJECT OVERVIEW > Historical Background ### > Project Intent > Data: Target and Feature Variables > Initial
More informationData Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier
Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio
More informationMachine Learning with MATLAB --classification
Machine Learning with MATLAB --classification Stanley Liang, PhD York University Classification the definition In machine learning and statistics, classification is the problem of identifying to which
More informationThe digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand).
http://waikato.researchgateway.ac.nz/ Research Commons at the University of Waikato Copyright Statement: The digital copy of this thesis is protected by the Copyright Act 1994 (New Zealand). The thesis
More informationBoxplot
Boxplot By: Meaghan Petix, Samia Porto & Franco Porto A boxplot is a convenient way of graphically depicting groups of numerical data through their five number summaries: the smallest observation (sample
More informationBusiness Club. Decision Trees
Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building
More informationData Mining Practical Machine Learning Tools and Techniques
Decision trees Extending previous approach: Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank to permit numeric s: straightforward
More informationJournal of Statistical Software
JSS Journal of Statistical Software November 2014, Volume 61, Issue 10. http://www.jstatsoft.org/ rferns: An Implementation of the Random Ferns Method for General-Purpose Machine Learning Miron Bartosz
More informationPart I. Instructor: Wei Ding
Classification Part I Instructor: Wei Ding Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1 Classification: Definition Given a collection of records (training set ) Each record contains a set
More informationA Systematic Overview of Data Mining Algorithms
A Systematic Overview of Data Mining Algorithms 1 Data Mining Algorithm A well-defined procedure that takes data as input and produces output as models or patterns well-defined: precisely encoded as a
More informationLab and Assignment Activity
Lab and Assignment Activity 1 Introduction Sometime ago, a Titanic dataset was released to the general public. This file is given to you as titanic_data.csv. This data is in text format and contains 12
More informationCyber attack detection using decision tree approach
Cyber attack detection using decision tree approach Amit Shinde Department of Industrial Engineering, Arizona State University,Tempe, AZ, USA {amit.shinde@asu.edu} In this information age, information
More informationDecision Tree: nagdmc waid
Decision Tree: Purpose approximates data by using a robust regression tree by using the weighted automatic inference detection (WAID) method. Declaration #include void (long rec1, long nvar,
More informationExperimental Design + k- Nearest Neighbors
10-601 Introduction to Machine Learning Machine Learning Department School of Computer Science Carnegie Mellon University Experimental Design + k- Nearest Neighbors KNN Readings: Mitchell 8.2 HTF 13.3
More informationmmpf: Monte-Carlo Methods for Prediction Functions by Zachary M. Jones
CONTRIBUTED RESEARCH ARTICLE 1 mmpf: Monte-Carlo Methods for Prediction Functions by Zachary M. Jones Abstract Machine learning methods can often learn high-dimensional functions which generalize well
More informationChapter Three: Contents
Volume Three Modules 15 January 2003 i Chapter Three: Contents (Activity Generator 15 January 2003 LA-UR-00-1725 TRANSIMS 3.0) 1. INTRODUCTION...2 1.1 OVERVIEW... 2 1.2 PURPOSE... 2 1.3 ACTIVITY GENERATOR
More informationINTRO TO RANDOM FOREST BY ANTHONY ANH QUOC DOAN
INTRO TO RANDOM FOREST BY ANTHONY ANH QUOC DOAN MOTIVATION FOR RANDOM FOREST Random forest is a great statistical learning model. It works well with small to medium data. Unlike Neural Network which requires
More informationInput: Concepts, Instances, Attributes
Input: Concepts, Instances, Attributes 1 Terminology Components of the input: Concepts: kinds of things that can be learned aim: intelligible and operational concept description Instances: the individual,
More informationIntroduction to Statistical Graphics Procedures
Introduction to Statistical Graphics Procedures Selvaratnam Sridharma, U.S. Census Bureau, Washington, DC ABSTRACT SAS statistical graphics procedures (SG procedures) that were introduced in SAS 9.2 help
More informationComputer Vision Group Prof. Daniel Cremers. 6. Boosting
Prof. Daniel Cremers 6. Boosting Repetition: Regression We start with a set of basis functions (x) =( 0 (x), 1(x),..., M 1(x)) x 2 í d The goal is to fit a model into the data y(x, w) =w T (x) To do this,
More informationIntro to R for Epidemiologists
Lab 9 (3/19/15) Intro to R for Epidemiologists Part 1. MPG vs. Weight in mtcars dataset The mtcars dataset in the datasets package contains fuel consumption and 10 aspects of automobile design and performance
More informationHomework: Data Mining
: Data Mining This homework sheet will test your knowledge of data mining using R. 3 a) Load the files Titanic.csv into R as follows. This dataset provides information on the survival of the passengers
More informationAn Empirical Comparison of Ensemble Methods Based on Classification Trees. Mounir Hamza and Denis Larocque. Department of Quantitative Methods
An Empirical Comparison of Ensemble Methods Based on Classification Trees Mounir Hamza and Denis Larocque Department of Quantitative Methods HEC Montreal Canada Mounir Hamza and Denis Larocque 1 June 2005
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Lecture 10 - Classification trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey
More informationFuzzyDT- A Fuzzy Decision Tree Algorithm Based on C4.5
FuzzyDT- A Fuzzy Decision Tree Algorithm Based on C4.5 Marcos E. Cintra 1, Maria C. Monard 2, and Heloisa A. Camargo 3 1 Exact and Natural Sciences Dept. - Federal University of the Semi-arid - UFERSA
More informationClassification: Basic Concepts, Decision Trees, and Model Evaluation
Classification: Basic Concepts, Decision Trees, and Model Evaluation Data Warehousing and Mining Lecture 4 by Hossen Asiful Mustafa Classification: Definition Given a collection of records (training set
More informationData Mining - Data. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - Data 1 / 47
Data Mining - Data Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - Data 1 / 47 Outline 1. Introduction 2. Data preprocessing 3. CPA with R 4. Exercise
More informationLinear discriminant analysis and logistic
Practical 6: classifiers Linear discriminant analysis and logistic This practical looks at two different methods of fitting linear classifiers. The linear discriminant analysis is implemented in the MASS
More informationRepresenting structural patterns: Reading Material: Chapter 3 of the textbook by Witten
Representing structural patterns: Plain Classification rules Decision Tree Rules with exceptions Relational solution Tree for Numerical Prediction Instance-based presentation Reading Material: Chapter
More informationInduction of Multivariate Decision Trees by Using Dipolar Criteria
Induction of Multivariate Decision Trees by Using Dipolar Criteria Leon Bobrowski 1,2 and Marek Krȩtowski 1 1 Institute of Computer Science, Technical University of Bia lystok, Poland 2 Institute of Biocybernetics
More informationCOMP 364: Computer Tools for Life Sciences
COMP 364: Computer Tools for Life Sciences Intro to machine learning with scikit-learn Christopher J.F. Cameron and Carlos G. Oliver 1 / 1 Key course information Assignment #4 available now due Monday,
More informationCART Bagging Trees Random Forests. Leo Breiman
CART Bagging Trees Random Forests Leo Breiman Breiman, L., J. Friedman, R. Olshen, and C. Stone, 1984: Classification and regression trees. Wadsworth Books, 358. Breiman, L., 1996: Bagging predictors.
More informationUSING REGRESSION TREES IN PREDICTIVE MODELLING
Production Systems and Information Engineering Volume 4 (2006), pp. 115-124 115 USING REGRESSION TREES IN PREDICTIVE MODELLING TAMÁS FEHÉR University of Miskolc, Hungary Department of Information Engineering
More information