Regression trees with R

Size: px
Start display at page:

Download "Regression trees with R"

Transcription

1 Regression trees with R Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l7-regression-trees.html#(1) 1/17

2 Implement decision trees with R To adapt a regression tree (or classification) we can use the tree() function from the tree library. The function automatically determines whether to implement a regression tree or a classification tree based on the dependent variable class: regression tree if Y is numeric classification tree if Y is factor The minimum input is very simple since it is sufficient to indicate the regression equation (or classification) and the data (similar to what has already been learned for the lm() function). file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l7-regression-trees.html#(1) 2/17

3 Example -Hitters First load the tree library and the Hitters dataset from the ISLR library library(tree) library(islr) data("hitters") ## clead data from rows with missing cases Hitters<-Hitters[complete.cases(Hitters),] To fit a tree with Salary f(y ears, Hits), use h.tree<-tree(salary~hits+years,hitters) summary(h.tree) ## ## Regression tree: ## tree(formula = Salary ~ Hits + Years, data = Hitters) ## Number of terminal nodes: 8 ## Residual mean deviance: = / 255 ## Distribution of residuals: ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l7-regression-trees.html#(1) 3/17

4 Plot of the tree plot(h.tree,lwd=3) text(h.tree,pretty=0,cex=1.2,col="blue") In the diagram, the height of the branches indicates the effectiveness of the split in the reduction of RSS. In the example, the first splits are the most effective. file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l7-regression-trees.html#(1) 4/17

5 Analysis of the tree h.tree ## node), split, n, deviance, yval ## * denotes terminal node ## ## 1) root ## 2) Years < ## 4) Hits < * ## 5) Hits > ## 10) Years < * ## 11) Years > * ## 3) Years > ## 6) Hits < ## 12) Years < * ## 13) Years > * ## 7) Hits > ## 14) Hits < ## 28) Years < * ## 29) Years > * ## 15) Hits > * file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l7-regression-trees.html#(1) 5/17

6 Interpret the analytical output To interpret the results in the previous slide, note that: Terminal nodes are indicated with an asterisk (8 in this case). The others are internal nodes The first line of the output gives us the key to reading the results: - node): node number - split: subdivision criterion - n: number of observations - deviance: RSS in that node in the case of regression trees; the value of the Gini index or entropy in the case of classification trees - the value ȳ in that node, that is, the prediction if the node is terminal. file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l7-regression-trees.html#(1) 6/17

7 Example: Node 1): includes all units of the dataset, 263 RSS = the Salary average is $ Node 2): the first split is defined by the Y ears < 4.5 condition, there are 90 players that satisfy the condition, the RSS for this group of players is $ the Salary average for the group is file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l7-regression-trees.html#(1) 7/17

8 Control the parameters of the tree The parameters for constructing the tree can be controlled with the tree.control() function tree.control(nobs, mincut = 5, minsize = 10, mindev = 0.01) Argument: nobs: number of observations in the analyzed data set. mincut: see help minsize: see help mindev: the minimum deviance to proceed to a split, decrease (increment) the default value to get a tree less (more) large. file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l7-regression-trees.html#(1) 8/17

9 Example To get the tree of slide 5 in the decision tree class, use setup<-tree.control(nrow(hitters), mincut = 5, minsize = 10, mindev = 0.15) h.tree<-tree(salary~hits+years,hitters,control=setup) Note that the new parameters have been inserted into the setup object which is then used as the tree argument file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l7-regression-trees.html#(1) 9/17

10 plot(h.tree,lwd=3) text(h.tree,pretty=0,cex=1.2,col="blue") file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l7-regression-trees.html#(1) 10/17

11 To obtain a larger tree, use, for example setup<-tree.control(nrow(hitters), mincut = 5, minsize = 10, mindev = 0.003) h.tree<-tree(salary~.,hitters,control=setup) Note: the Salary~. syntax tells R to use Salary as dependent variable and all other variables in the dataset as independent file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l7-regression-trees.html#(1) 11/17

12 plot(h.tree,lwd=3) text(h.tree,pretty=0,cex=1.2,col="blue") file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l7-regression-trees.html#(1) 12/17

13 Use the option type="uniform" to have a more readable result plot(h.tree,lwd=3,type="uniform") text(h.tree,pretty=0,cex=1.2,col="blue") file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l7-regression-trees.html#(1) 13/17

14 Prune the tree The cv.tree() function cross-validates to achieve the optimal level of tree complexity. To adapt regression trees use the FUN = prune.tree argument (in which case RSS will be the guiding criterion) To adapt classification trees use the FUN = prune.misclass argument (in which case the error rate will be the guiding criterion) The output of cv.tree() will report: the number of terminal nodes of each tree considered (size) the corresponding RSS (or error rate) (dev) other parameters (not discussed in class). file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l7-regression-trees.html#(1) 14/17

15 Output We carry out a cross evaluation of the tree obtained in slide 13 of this lab set.seed (3) cv.hitters =cv.tree(h.tree,fun=prune.tree ) cv.hitters ## $size ## [1] ## ## $dev ## [1] ## [8] ## [15] ## ## $k ## [1] -Inf ## [7] ## [13] ## ## $method ## [1] "deviance" ## ## attr(,"class") ## [1] "prune" "tree.sequence" file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l7-regression-trees.html#(1) 15/17

16 Plot We report the results for size and dev in a graph plot(cv.hitters$size,cv.hitters$dev,type="b", lwd=3,col="blue", xlab="nodi terminali", ylab="rss",main="cost complexity pruning" ) Cross-validation indicates that the optimal size is 3. file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l7-regression-trees.html#(1) 16/17

17 The pruned tree At this point, use the prune.tree() function to prune the initial tree to the number of nodes chosen based on the results obtained by cross-validation. prune.hitters=prune.tree(h.tree, best =3) plot(prune.hitters,lwd=3) text(prune.hitters,pretty =0,cex=1.2,col="blue") file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l7-regression-trees.html#(1) 17/17

Introduction to Classification & Regression Trees

Introduction to Classification & Regression Trees Introduction to Classification & Regression Trees ISLR Chapter 8 vember 8, 2017 Classification and Regression Trees Carseat data from ISLR package Classification and Regression Trees Carseat data from

More information

Lecture 19: Decision trees

Lecture 19: Decision trees Lecture 19: Decision trees Reading: Section 8.1 STATS 202: Data mining and analysis November 10, 2017 1 / 17 Decision trees, 10,000 foot view R2 R5 t4 1. Find a partition of the space of predictors. X2

More information

The Basics of Decision Trees

The Basics of Decision Trees Tree-based Methods Here we describe tree-based methods for regression and classification. These involve stratifying or segmenting the predictor space into a number of simple regions. Since the set of splitting

More information

STAT Midterm Research Project: Random Forest. Vicky Xu. Xiyu Liang. Xue Cao

STAT Midterm Research Project: Random Forest. Vicky Xu. Xiyu Liang. Xue Cao STAT 5703 Midterm Research Project: Random Forest Vicky Xu Xiyu Liang Xue Cao 1 Table of Contents Abstract... 4 Literature Review... 5 Decision Tree... 6 1. Definition and Overview... 6 2. Some most common

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees Matthew S. Shotwell, Ph.D. Department of Biostatistics Vanderbilt University School of Medicine Nashville, TN, USA March 16, 2018 Introduction trees partition feature

More information

Decision trees. For this lab, we will use the Carseats data set from the ISLR package. (install and) load the package with the data set

Decision trees. For this lab, we will use the Carseats data set from the ISLR package. (install and) load the package with the data set Decision trees For this lab, we will use the Carseats data set from the ISLR package. (install and) load the package with the data set # install.packages('islr') library(islr) Carseats is a simulated data

More information

Tree Models. [1] "Pollution" "Temp" "Industry" "Population" "Wind" [6] "Rain" "Wet.days"

Tree Models. [1] Pollution Temp Industry Population Wind [6] Rain Wet.days 21 Tree Models Tree models are computationally intensive methods that are used in situations where there are many explanatory variables and we would like guidance about which of them to include in the

More information

14. League: A factor with levels A and N indicating player s league at the end of 1986

14. League: A factor with levels A and N indicating player s league at the end of 1986 PENALIZED REGRESSION Ridge and The LASSO Note: The example contained herein was copied from the lab exercise in Chapter 6 of Introduction to Statistical Learning by. For this exercise, we ll use some baseball

More information

Classification and Regression Trees

Classification and Regression Trees Classification and Regression Trees David S. Rosenberg New York University April 3, 2018 David S. Rosenberg (New York University) DS-GA 1003 / CSCI-GA 2567 April 3, 2018 1 / 51 Contents 1 Trees 2 Regression

More information

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme

Machine Learning. A. Supervised Learning A.7. Decision Trees. Lars Schmidt-Thieme Machine Learning A. Supervised Learning A.7. Decision Trees Lars Schmidt-Thieme Information Systems and Machine Learning Lab (ISMLL) Institute for Computer Science University of Hildesheim, Germany 1 /

More information

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Introduction In this lab you will work through Section 5.3 of ISL and record your code and results in an RMarkdown document. I have added section

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany

Lars Schmidt-Thieme, Information Systems and Machine Learning Lab (ISMLL), University of Hildesheim, Germany Syllabus Fri. 27.10. (1) 0. Introduction A. Supervised Learning: Linear Models & Fundamentals Fri. 3.11. (2) A.1 Linear Regression Fri. 10.11. (3) A.2 Linear Classification Fri. 17.11. (4) A.3 Regularization

More information

Statistical Methods for Data Mining

Statistical Methods for Data Mining Statistical Methods for Data Mining Kuangnan Fang Xiamen University Email: xmufkn@xmu.edu.cn Tree-based Methods Here we describe tree-based methods for regression and classification. These involve stratifying

More information

k-nn classification with R QMMA

k-nn classification with R QMMA k-nn classification with R QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l1-knn-eng.html#(1) 1/16 HW (Height and weight) of adults Statistics

More information

Nonparametric Classification Methods

Nonparametric Classification Methods Nonparametric Classification Methods We now examine some modern, computationally intensive methods for regression and classification. Recall that the LDA approach constructs a line (or plane or hyperplane)

More information

Discussion Notes 3 Stepwise Regression and Model Selection

Discussion Notes 3 Stepwise Regression and Model Selection Discussion Notes 3 Stepwise Regression and Model Selection Stepwise Regression There are many different commands for doing stepwise regression. Here we introduce the command step. There are many arguments

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information

Orange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1)

Orange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1) Orange Juice data Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l10-oj-data.html#(1) 1/31 Orange Juice Data The data contain weekly sales of refrigerated

More information

[8] Data Mining: Trees

[8] Data Mining: Trees [8] Data Mining: Trees Nonlinear regression and classification with Decision Trees, CART, and Random Forests Matt Taddy, University of Chicago Booth School of Business faculty.chicagobooth.edu/matt.taddy/teaching

More information

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability

Decision trees. Decision trees are useful to a large degree because of their simplicity and interpretability Decision trees A decision tree is a method for classification/regression that aims to ask a few relatively simple questions about an input and then predicts the associated output Decision trees are useful

More information

Data Science with R. Decision Trees.

Data Science with R. Decision Trees. http: // togaware. com Copyright 2014, Graham.Williams@togaware.com 1/46 Data Science with R Decision Trees Graham.Williams@togaware.com Data Scientist Australian Taxation O Adjunct Professor, Australian

More information

Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan

Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan Additive Models, Trees, etc. Based in part on Chapter 9 of Hastie, Tibshirani, and Friedman David Madigan Predictive Modeling Goal: learn a mapping: y = f(x;θ) Need: 1. A model structure 2. A score function

More information

Biology Project 1

Biology Project 1 Biology 6317 Project 1 Data and illustrations courtesy of Professor Tony Frankino, Department of Biology/Biochemistry 1. Background The data set www.math.uh.edu/~charles/wing_xy.dat has measurements related

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2012 http://ce.sharif.edu/courses/90-91/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

Random Forest A. Fornaser

Random Forest A. Fornaser Random Forest A. Fornaser alberto.fornaser@unitn.it Sources Lecture 15: decision trees, information theory and random forests, Dr. Richard E. Turner Trees and Random Forests, Adele Cutler, Utah State University

More information

A Short Introduction to the caret Package

A Short Introduction to the caret Package A Short Introduction to the caret Package Max Kuhn max.kuhn@pfizer.com October 28, 2016 The caret package (short for classification and regression training) contains functions to streamline the model training

More information

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai

Decision Trees Dr. G. Bharadwaja Kumar VIT Chennai Decision Trees Decision Tree Decision Trees (DTs) are a nonparametric supervised learning method used for classification and regression. The goal is to create a model that predicts the value of a target

More information

Introduction to Data Science

Introduction to Data Science Introduction to Data Science CS 491, DES 430, IE 444, ME 444, MKTG 477 UIC Innovation Center Fall 2017 and Spring 2018 Instructors: Charles Frisbie, Marco Susani, Michael Scott and Ugo Buy Author: Ugo

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

Classification/Regression Trees and Random Forests

Classification/Regression Trees and Random Forests Classification/Regression Trees and Random Forests Fabio G. Cozman - fgcozman@usp.br November 6, 2018 Classification tree Consider binary class variable Y and features X 1,..., X n. Decide Ŷ after a series

More information

Decision tree learning

Decision tree learning Decision tree learning Andrea Passerini passerini@disi.unitn.it Machine Learning Learning the concept Go to lesson OUTLOOK Rain Overcast Sunny TRANSPORTATION LESSON NO Uncovered Covered Theoretical Practical

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Features and Patterns The Curse of Size and

More information

Data Mining Practical Machine Learning Tools and Techniques

Data Mining Practical Machine Learning Tools and Techniques Decision trees Extending previous approach: Data Mining Practical Machine Learning Tools and Techniques Slides for Chapter 6 of Data Mining by I. H. Witten and E. Frank to permit numeric s: straightforward

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Features and Feature Selection Hamid R. Rabiee Jafar Muhammadi Spring 2014 http://ce.sharif.edu/courses/92-93/2/ce725-2/ Agenda Features and Patterns The Curse of Size and

More information

Data Mining Lecture 8: Decision Trees

Data Mining Lecture 8: Decision Trees Data Mining Lecture 8: Decision Trees Jo Houghton ECS Southampton March 8, 2019 1 / 30 Decision Trees - Introduction A decision tree is like a flow chart. E. g. I need to buy a new car Can I afford it?

More information

Discriminant analysis in R QMMA

Discriminant analysis in R QMMA Discriminant analysis in R QMMA Emanuele Taufer file:///c:/users/emanuele.taufer/google%20drive/2%20corsi/5%20qmma%20-%20mim/0%20labs/l4-lda-eng.html#(1) 1/26 Default data Get the data set Default library(islr)

More information

Gelman-Hill Chapter 3

Gelman-Hill Chapter 3 Gelman-Hill Chapter 3 Linear Regression Basics In linear regression with a single independent variable, as we have seen, the fundamental equation is where ŷ bx 1 b0 b b b y 1 yx, 0 y 1 x x Bivariate Normal

More information

Lecture 20: Bagging, Random Forests, Boosting

Lecture 20: Bagging, Random Forests, Boosting Lecture 20: Bagging, Random Forests, Boosting Reading: Chapter 8 STATS 202: Data mining and analysis November 13, 2017 1 / 17 Classification and Regression trees, in a nut shell Grow the tree by recursively

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Lecture 10 - Classification trees Tom Kelsey School of Computer Science University of St Andrews http://tom.home.cs.st-andrews.ac.uk twk@st-andrews.ac.uk Tom Kelsey

More information

Model Inference and Averaging. Baging, Stacking, Random Forest, Boosting

Model Inference and Averaging. Baging, Stacking, Random Forest, Boosting Model Inference and Averaging Baging, Stacking, Random Forest, Boosting Bagging Bootstrap Aggregating Bootstrap Repeatedly select n data samples with replacement Each dataset b=1:b is slightly different

More information

CS229 Lecture notes. Raphael John Lamarre Townshend

CS229 Lecture notes. Raphael John Lamarre Townshend CS229 Lecture notes Raphael John Lamarre Townshend Decision Trees We now turn our attention to decision trees, a simple yet flexible class of algorithms. We will first consider the non-linear, region-based

More information

MACHINE LEARNING TOOLBOX. Logistic regression on Sonar

MACHINE LEARNING TOOLBOX. Logistic regression on Sonar MACHINE LEARNING TOOLBOX Logistic regression on Sonar Classification models Categorical (i.e. qualitative) target variable Example: will a loan default? Still a form of supervised learning Use a train/test

More information

CARTWARE Documentation

CARTWARE Documentation CARTWARE Documentation CARTWARE is a collection of R functions written for Classification and Regression Tree (CART) Analysis of ecological data sets. All of these functions make use of existing R functions

More information

Resampling methods (Ch. 5 Intro)

Resampling methods (Ch. 5 Intro) Zavádějící faktor (Confounding factor), ale i 'současně působící faktor' Resampling methods (Ch. 5 Intro) Key terms: Train/Validation/Test data Crossvalitation One-leave-out = LOOCV Bootstrup key slides

More information

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang

K-Nearest Neighbour (Continued) Dr. Xiaowei Huang K-Nearest Neighbour (Continued) Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ A few things: No lectures on Week 7 (i.e., the week starting from Monday 5 th November), and Week 11 (i.e., the week

More information

Lecture 06 Decision Trees I

Lecture 06 Decision Trees I Lecture 06 Decision Trees I 08 February 2016 Taylor B. Arnold Yale Statistics STAT 365/665 1/33 Problem Set #2 Posted Due February 19th Piazza site https://piazza.com/ 2/33 Last time we starting fitting

More information

CART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology

CART. Classification and Regression Trees. Rebecka Jörnsten. Mathematical Sciences University of Gothenburg and Chalmers University of Technology CART Classification and Regression Trees Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology CART CART stands for Classification And Regression Trees.

More information

IQR = number. summary: largest. = 2. Upper half: Q3 =

IQR = number. summary: largest. = 2. Upper half: Q3 = Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number

More information

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Non-Linear Regression Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Understanding the need for non-parametric regressions 2 Familiarizing with two common

More information

Erin LeDell Instructor

Erin LeDell Instructor MACHINE LEARNING WITH TREE-BASED MODELS IN R Introduction to regression trees Erin LeDell Instructor Train a Regresion Tree in R > rpart(formula =, data =, method = ) Train/Validation/Test Split training

More information

Advanced Econometric Methods EMET3011/8014

Advanced Econometric Methods EMET3011/8014 Advanced Econometric Methods EMET3011/8014 Lecture 2 John Stachurski Semester 1, 2011 Announcements Missed first lecture? See www.johnstachurski.net/emet Weekly download of course notes First computer

More information

Calling External Routines in Stata

Calling External Routines in Stata XV Convegno Italiano degli Utenti di Stata Bologna, 15-16 November, 2018 Calling External Routines in Stata Giovanni Cerulli and Antonio Zinilli IRCrES-CNR 1 Motivation Stata allows to call external routines,

More information

Chapter 3. Descriptive Measures. Slide 3-2. Copyright 2012, 2008, 2005 Pearson Education, Inc.

Chapter 3. Descriptive Measures. Slide 3-2. Copyright 2012, 2008, 2005 Pearson Education, Inc. Chapter 3 Descriptive Measures Slide 3-2 Section 3.1 Measures of Center Slide 3-3 Definition 3.1 Mean of a Data Set The mean of a data set is the sum of the observations divided by the number of observations.

More information

Package treethresh. R topics documented: June 30, Version Date

Package treethresh. R topics documented: June 30, Version Date Version 0.1-11 Date 2017-06-29 Package treethresh June 30, 2017 Title Methods for Tree-Based Local Adaptive Thresholding Author Ludger Evers and Tim Heaton

More information

Supervised Learning Classification Algorithms Comparison

Supervised Learning Classification Algorithms Comparison Supervised Learning Classification Algorithms Comparison Aditya Singh Rathore B.Tech, J.K. Lakshmipat University -------------------------------------------------------------***---------------------------------------------------------

More information

A Short Introduction to the caret Package

A Short Introduction to the caret Package A Short Introduction to the caret Package Max Kuhn max.kuhn@pfizer.com February 12, 2013 The caret package (short for classification and regression training) contains functions to streamline the model

More information

Regularization Methods. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Regularization Methods. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Regularization Methods Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Avoiding overfitting and improving model interpretability with the help of regularization

More information

Machine Learning. Decision Trees. Manfred Huber

Machine Learning. Decision Trees. Manfred Huber Machine Learning Decision Trees Manfred Huber 2015 1 Decision Trees Classifiers covered so far have been Non-parametric (KNN) Probabilistic with independence (Naïve Bayes) Linear in features (Logistic

More information

Statistical Machine Learning Hilary Term 2018

Statistical Machine Learning Hilary Term 2018 Statistical Machine Learning Hilary Term 2018 Pier Francesco Palamara Department of Statistics University of Oxford Slide credits and other course material can be found at: http://www.stats.ox.ac.uk/~palamara/sml18.html

More information

Business Club. Decision Trees

Business Club. Decision Trees Business Club Decision Trees Business Club Analytics Team December 2017 Index 1. Motivation- A Case Study 2. The Trees a. What is a decision tree b. Representation 3. Regression v/s Classification 4. Building

More information

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier

Data Mining. 3.2 Decision Tree Classifier. Fall Instructor: Dr. Masoud Yaghini. Chapter 5: Decision Tree Classifier Data Mining 3.2 Decision Tree Classifier Fall 2008 Instructor: Dr. Masoud Yaghini Outline Introduction Basic Algorithm for Decision Tree Induction Attribute Selection Measures Information Gain Gain Ratio

More information

Decision Trees / Discrete Variables

Decision Trees / Discrete Variables Decision trees Decision Trees / Discrete Variables Season Location Fun? Location summer prison -1 summer beach +1 Prison Beach Ski Slope Winter ski-slope +1-1 Season +1 Winter beach -1 Winter Summer -1

More information

Classification with Decision Tree Induction

Classification with Decision Tree Induction Classification with Decision Tree Induction This algorithm makes Classification Decision for a test sample with the help of tree like structure (Similar to Binary Tree OR k-ary tree) Nodes in the tree

More information

HW 10 STAT 472, Spring 2018

HW 10 STAT 472, Spring 2018 HW 10 STAT 472, Spring 2018 1) (0 points) Do parts (a), (b), (c), and (e) of Exercise 2 on p. 298 of ISL. 2) (0 points) Do Exercise 3 on p. 298 of ISL. 3) For this problem, you can merely submit the things

More information

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015 STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, tsvv@steno.dk, Steno Diabetes Center June 11, 2015 Contents 1 Introduction 1 2 Recap: Variables 2 3 Data Containers 2 3.1 Vectors................................................

More information

Stat 4510/7510 Homework 4

Stat 4510/7510 Homework 4 Stat 45/75 1/7. Stat 45/75 Homework 4 Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details so that the grader

More information

Chapter 5. Tree-based Methods

Chapter 5. Tree-based Methods Chapter 5. Tree-based Methods Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Regression

More information

Data Mining in Bioinformatics Day 1: Classification

Data Mining in Bioinformatics Day 1: Classification Data Mining in Bioinformatics Day 1: Classification Karsten Borgwardt February 18 to March 1, 2013 Machine Learning & Computational Biology Research Group Max Planck Institute Tübingen and Eberhard Karls

More information

Notes based on: Data Mining for Business Intelligence

Notes based on: Data Mining for Business Intelligence Chapter 9 Classification and Regression Trees Roger Bohn April 2017 Notes based on: Data Mining for Business Intelligence 1 Shmueli, Patel & Bruce 2 3 II. Results and Interpretation There are 1183 auction

More information

7. Decision or classification trees

7. Decision or classification trees 7. Decision or classification trees Next we are going to consider a rather different approach from those presented so far to machine learning that use one of the most common and important data structure,

More information

Decision tree modeling using R

Decision tree modeling using R Big-data Clinical Trial Column Page 1 of Decision tree modeling using R Zhongheng Zhang Department of Critical Care Medicine, Jinhua Municipal Central Hospital, Jinhua Hospital of Zhejiang University,

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

Homework2 Chapter4 exersices Hongying Du

Homework2 Chapter4 exersices Hongying Du Homework2 Chapter4 exersices Hongying Du Note: use lg to denote log 2 in this whole file. 3. Consider the training examples shown in Table 4.8 for a binary classification problem. (a) The entropy of this

More information

Lab 9 - Linear Model Selection in Python

Lab 9 - Linear Model Selection in Python Lab 9 - Linear Model Selection in Python March 7, 2016 This lab on Model Validation using Validation and Cross-Validation is a Python adaptation of p. 248-251 of Introduction to Statistical Learning with

More information

Regression Analysis and Linear Regression Models

Regression Analysis and Linear Regression Models Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical

More information

MDM 4UI: Unit 8 Day 2: Regression and Correlation

MDM 4UI: Unit 8 Day 2: Regression and Correlation MDM 4UI: Unit 8 Day 2: Regression and Correlation Regression: The process of fitting a line or a curve to a set of data. Coefficient of Correlation(r): This is a value between and allows statisticians

More information

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU

Machine Learning. Decision Trees. Le Song /15-781, Spring Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU Machine Learning 10-701/15-781, Spring 2008 Decision Trees Le Song Lecture 6, September 6, 2012 Based on slides from Eric Xing, CMU Reading: Chap. 1.6, CB & Chap 3, TM Learning non-linear functions f:

More information

9.1. How do the nerve cells in your brain communicate with each other? Signals have. Like a Glove. Least Squares Regression

9.1. How do the nerve cells in your brain communicate with each other? Signals have. Like a Glove. Least Squares Regression Like a Glove Least Squares Regression.1 Learning Goals In this lesson, you will: Determine and interpret the least squares regression equation for a data set using a formula. Use interpolation to make

More information

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 4, 217 PDF file location: http://www.murraylax.org/rtutorials/regression_intro.pdf HTML file location:

More information

An Anomaly-Based Intrusion Detection System for the Smart Grid Based on CART Decision Tree

An Anomaly-Based Intrusion Detection System for the Smart Grid Based on CART Decision Tree An Anomaly-Based Intrusion Detection System for the Smart Grid Based on CART Decision Tree P. Radoglou-Grammatikis and P. Sarigiannidis* University of Western Macedonia Department of Informatics & Telecommunications

More information

ST Lab 1 - The basics of SAS

ST Lab 1 - The basics of SAS ST 512 - Lab 1 - The basics of SAS What is SAS? SAS is a programming language based in C. For the most part SAS works in procedures called proc s. For instance, to do a correlation analysis there is proc

More information

CPSC Coding Project (due December 17)

CPSC Coding Project (due December 17) CPSC Coding Project (due December 17) matlearn For the coding project, as a class we are going to develop a new Matlab toolbox for supervised learning, called matlearn. This toolbox will make a wide range

More information

International Journal of Software and Web Sciences (IJSWS)

International Journal of Software and Web Sciences (IJSWS) International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) ISSN (Print): 2279-0063 ISSN (Online): 2279-0071 International

More information

Nina Zumel and John Mount Win-Vector LLC

Nina Zumel and John Mount Win-Vector LLC SUPERVISED LEARNING IN R: REGRESSION Evaluating a model graphically Nina Zumel and John Mount Win-Vector LLC "line of perfect prediction" Systematic errors DataCamp Plotting Ground Truth vs. Predictions

More information

15-780: Graduate Artificial Intelligence. Decision trees

15-780: Graduate Artificial Intelligence. Decision trees 15-780: Graduate Artificial Intelligence Decision trees Graphical models So far we discussed models that capture joint probability distributions These have many uses, and can also be used to determine

More information

Credit card Fraud Detection using Predictive Modeling: a Review

Credit card Fraud Detection using Predictive Modeling: a Review February 207 IJIRT Volume 3 Issue 9 ISSN: 2396002 Credit card Fraud Detection using Predictive Modeling: a Review Varre.Perantalu, K. BhargavKiran 2 PG Scholar, CSE, Vishnu Institute of Technology, Bhimavaram,

More information

Assignment 6 - Model Building

Assignment 6 - Model Building Assignment 6 - Model Building your name goes here Due: Wednesday, March 7, 2018, noon, to Sakai Summary Primarily from the topics in Chapter 9 of your text, this homework assignment gives you practice

More information

Chapter 7: Linear regression

Chapter 7: Linear regression Chapter 7: Linear regression Objective (1) Learn how to model association bet. 2 variables using a straight line (called "linear regression"). (2) Learn to assess the quality of regression models. (3)

More information

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning

Chapter ML:III. III. Decision Trees. Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning Chapter ML:III III. Decision Trees Decision Trees Basics Impurity Functions Decision Tree Algorithms Decision Tree Pruning ML:III-67 Decision Trees STEIN/LETTMANN 2005-2017 ID3 Algorithm [Quinlan 1986]

More information

Data Mining and Knowledge Discovery: Practice Notes

Data Mining and Knowledge Discovery: Practice Notes Data Mining and Knowledge Discovery: Practice Notes Petra Kralj Novak Petra.Kralj.Novak@ijs.si 2016/01/12 1 Keywords Data Attribute, example, attribute-value data, target variable, class, discretization

More information

Section 2.1: Intro to Simple Linear Regression & Least Squares

Section 2.1: Intro to Simple Linear Regression & Least Squares Section 2.1: Intro to Simple Linear Regression & Least Squares Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.1, 7.2 1 Regression:

More information

Package uclaboot. June 18, 2003

Package uclaboot. June 18, 2003 Package uclaboot June 18, 2003 Version 0.1-3 Date 2003/6/18 Depends R (>= 1.7.0), boot, modreg Title Simple Bootstrap Routines for UCLA Statistics Author Maintainer

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

Decision Tree: nagdmc waid

Decision Tree: nagdmc waid Decision Tree: Purpose approximates data by using a robust regression tree by using the weighted automatic inference detection (WAID) method. Declaration #include void (long rec1, long nvar,

More information

9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form)

9/6/14. Our first learning algorithm. Comp 135 Introduction to Machine Learning and Data Mining. knn Algorithm. knn Algorithm (simple form) Comp 135 Introduction to Machine Learning and Data Mining Our first learning algorithm How would you classify the next example? Fall 2014 Professor: Roni Khardon Computer Science Tufts University o o o

More information

An introduction to classification and regression trees with PROC HPSPLIT Peter L. Flom Peter Flom Consulting, LLC

An introduction to classification and regression trees with PROC HPSPLIT Peter L. Flom Peter Flom Consulting, LLC Paper AA-42 An introduction to classification and regression trees with PROC HPSPLIT Peter L. Flom Peter Flom Consulting, LLC ABSTRACT Classification and regression trees are extremely intuitive to read

More information

HW 10 STAT 672, Summer 2018

HW 10 STAT 672, Summer 2018 HW 10 STAT 672, Summer 2018 1) (0 points) Do parts (a), (b), (c), and (e) of Exercise 2 on p. 298 of ISL. 2) (0 points) Do Exercise 3 on p. 298 of ISL. 3) For this problem, try to use the 64 bit version

More information

K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017

K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017 K-fold cross validation in the Tidyverse Stephanie J. Spielman 11/7/2017 Requirements This demo requires several packages: tidyverse (dplyr, tidyr, tibble, ggplot2) modelr broom proc Background K-fold

More information

Example of DT Apply Model Example Learn Model Hunt s Alg. Measures of Node Impurity DT Examples and Characteristics. Classification.

Example of DT Apply Model Example Learn Model Hunt s Alg. Measures of Node Impurity DT Examples and Characteristics. Classification. lassification-decision Trees, Slide 1/56 Classification Decision Trees Huiping Cao lassification-decision Trees, Slide 2/56 Examples of a Decision Tree Tid Refund Marital Status Taxable Income Cheat 1

More information