NEURAL NETWORKS. Cement. Blast Furnace Slag. Fly Ash. Water. Superplasticizer. Coarse Aggregate. Fine Aggregate. Age

Similar documents
3 Ways to Improve Your Regression

Math 263 Excel Assignment 3

Orange Juice data. Emanuele Taufer. 4/12/2018 Orange Juice data (1)

Exercise 2.23 Villanova MAT 8406 September 7, 2015

STENO Introductory R-Workshop: Loading a Data Set Tommi Suvitaival, Steno Diabetes Center June 11, 2015

Solution to Bonus Questions

Multiple Linear Regression

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

A Knitr Demo. Charles J. Geyer. February 8, 2017

Gelman-Hill Chapter 3

Among those 14 potential explanatory variables,non-dummy variables are:

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Stat 579: More Preliminaries, Reading from Files

Regression Analysis and Linear Regression Models

Regression on the trees data with R

22s:152 Applied Linear Regression

Probabilistic Analysis Tutorial

22s:152 Applied Linear Regression

Chapter 6: DESCRIPTIVE STATISTICS

Salary 9 mo : 9 month salary for faculty member for 2004

AA BB CC DD EE. Introduction to Graphics in R

Section 2.1: Intro to Simple Linear Regression & Least Squares

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation

A (very) brief introduction to R

Regression III: Lab 4

610 R12 Prof Colleen F. Moore Analysis of variance for Unbalanced Between Groups designs in R For Psychology 610 University of Wisconsin--Madison

Scatterplot: The Bridge from Correlation to Regression

Introduction to R, Github and Gitlab

Section 2.3: Simple Linear Regression: Predictions and Inference

Handling Missing Values

Getting Started in R

Regression. Notes. Page 1 25-JAN :21:57. Output Created Comments

36-402/608 HW #1 Solutions 1/21/2010

WINKS SDA Statistical Data Analysis and Graphs. WINKS R Command Summary Reference Guide

An introduction to SPSS

/4 Directions: Graph the functions, then answer the following question.

Multiple Regression White paper

Lab 1: Introduction, Plotting, Data manipulation

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)

Python for Data Analysis. Prof.Sushila Aghav-Palwe Assistant Professor MIT

Stat 5303 (Oehlert): Response Surfaces 1

Getting Started in R

References R's single biggest strenght is it online community. There are tons of free tutorials on R.

Robust Linear Regression (Passing- Bablok Median-Slope)

Comparing Fitted Models with the fit.models Package

Generalized Additive Models

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

Chapter 7: Linear regression

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Some methods for the quantification of prediction uncertainties for digital soil mapping: Universal kriging prediction variance.

Section 2.1: Intro to Simple Linear Regression & Least Squares

Stat 4510/7510 Homework 4

Logical operators: R provides an extensive list of logical operators. These include

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Exploratory model analysis

Statistical Analysis in R Guest Lecturer: Maja Milosavljevic January 28, 2015

R Graphics. SCS Short Course March 14, 2008

Mid-Chapter Quiz: Lessons 2-1 through 2-3

Moving Beyond Linearity

D-Optimal Designs. Chapter 888. Introduction. D-Optimal Design Overview

An Introductory Guide to R

Section 2.2: Covariance, Correlation, and Least Squares

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Machine Learning: Practice Midterm, Spring 2018

1 Simple Linear Regression

The Statistical Sleuth in R: Chapter 10

TPC Desktop Series. Least Squares Learning Guide 1/18

Recitation Supplement: Creating a Neural Network for Classification SAS EM December 2, 2002

How to use FSBforecast Excel add in for regression analysis

HW 10 STAT 472, Spring 2018

Estimating R 0 : Solutions

Applied Regression Modeling: A Business Approach

More data analysis examples

BIOSTAT640 R Lab1 for Spring 2016

Poisson Regression and Model Checking

Tanagra Tutorial. Determining the right number of neurons and layers in a multilayer perceptron.

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

Performing Cluster Bootstrapped Regressions in R

Organizing data in R. Fitting Mixed-Effects Models Using the lme4 Package in R. R packages. Accessing documentation. The Dyestuff data set

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Section 4.1: Time Series I. Jared S. Murray The University of Texas at Austin McCombs School of Business

22s:152 Applied Linear Regression DeCook Fall 2011 Lab 3 Monday October 3

SYS 6021 Linear Statistical Models

1 Lab 1. Graphics and Checking Residuals

Analysis of variance - ANOVA

Demo yeast mutant analysis

Fault Detection using Advanced Analytics at CERN's Large Hadron Collider: Too Hot or Too Cold BIWA Summit 2016

1. Descriptive Statistics

Introduction to R. Introduction to Econometrics W

CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

k-nn classification with R QMMA

Generalized least squares (GLS) estimates of the level-2 coefficients,

Chapter 2 Assignment (due Thursday, April 19)

Cross-Validation Alan Arnholt 3/22/2016

Stat 290: Lab 2. Introduction to R/S-Plus

Transcription:

NEURAL NETWORKS As an introduction, we ll tackle a prediction task with a continuous variable. We ll reproduce research from the field of cement and concrete manufacturing that seeks to model the compressive strength of concrete using it s age and content. This dataset was donated to the UCI Machine Learning Repository by Professor I-Cheng Yeh from Chung-Hua University in Taiwan. This tutorial was adapted from that of Brett Lantz in Machine Learning with R (Packt Publishing 2015). The following predictor variables, all measured in kg per cubic meter of mix (aside from age which is measured in days), are used to model the compressive strength of concrete as measured in MPa: Cement Blast Furnace Slag Fly Ash Water Superplasticizer Coarse Aggregate Fine Aggregate Age This dataset has 1030 observations. All of the variables are numeric, taking positive values. The scales will vary quite a bit since some ingredients are more prevalent than others. > load("concrete.rdata") > str(concrete) 'data.frame': 1030 obs. of 9 variables: $ cement : num 141 169 250 266 155... $ slag : num 212 42.2 0 114 183.4... $ ash : num 0 124.3 95.7 0 0... $ water : num 204 158 187 228 193... $ superplastic: num 0 10.8 5.5 0 9.1 0 0 6.4 0 9... $ coarseagg : num 972 1081 957 932 1047... $ fineagg : num 748 796 861 670 697... $ age : int 28 14 28 28 28 90 7 56 28 28... $ strength : num 29.9 23.5 29.2 45.9 18.3... 1

2 Let s take an 80-20 split of the data for training and validation and then explore some preliminary relationships with the target variable. > TrainInd = sample(c(t,f),size=1030, replace=t,prob=c(0.8,0.2)) > train=concrete[trainind,] > valid=concrete[!trainind,] First let s check out a couple scatter plot matrices containing the target. > pairs(strength~cement+slag+ash+water,data=train, + main="first Scatterplot Matrix") First Scatterplot Matrix 100 300 500 0 50 150 strength 0 20 60 100 300 500 cement slag 0 100 250 0 50 150 ash water 0 20 60 0 100 250 120 180 240 120 180 240 > pairs(strength~superplastic+coarseagg+fineagg+age,data=train, + main="second Scatterplot Matrix")

3 Second Scatterplot Matrix 0 10 20 30 600 800 1000 strength 0 20 60 0 10 20 30 superplastic 600 800 1000 coarseagg fineagg 800 950 1100 age 0 200 0 20 60 800 950 1100 0 100 300 There certainly seem to be some relationships between input variables, and some variables appear to have a relationship with the target, but overall it is a little difficult to tell whether or not we will be able to make a good model for the compressive strength using these inputs. Let s try a simple linear model with all of the input variables and see how it performs. > linear = lm(strength~cement+slag+ash+water+superplastic+coarseagg+fineagg+age,data=train) > summary(linear) Call: lm(formula = strength ~ cement + slag + ash + water + superplastic + coarseagg + fineagg + age, data = train) Residuals: Min 1Q Median 3Q Max -28.724-6.266 0.600 6.588 32.882 Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) -2.945291 29.675529-0.099 0.9210 cement 0.115508 0.009437 12.239 < 2e-16 *** slag 0.100233 0.011394 8.797 < 2e-16 *** ash 0.086760 0.013987 6.203 8.80e-10 ***

4 water -0.178346 0.045034-3.960 8.14e-05 *** superplastic 0.263093 0.104180 2.525 0.0117 * coarseagg 0.011785 0.010465 1.126 0.2604 fineagg 0.010877 0.011956 0.910 0.3632 age 0.106462 0.005740 18.549 < 2e-16 *** --- Signif. codes: 0 *** 0.001 ** 0.01 * 0.05. 0.1 1 Residual standard error: 10.38 on 818 degrees of freedom Multiple R-squared: 0.6149, Adjusted R-squared: 0.6111 F-statistic: 163.3 on 8 and 818 DF, p-value: < 2.2e-16 Most of the variables appear to have a significant (linear) relationship with compressive strength, but the overall performance of the model leaves much to be desired (adjusted R 2 0.6). Let s check the MAPE on our validation data. > linearpred = predict(linear, valid) > linearrsq = cor(linearpred,valid$strength) > linearmape = mean((linearpred-valid$strength)/valid$strength) > cat("linear regression R-squared:", linearrsq) linear regression R-squared: 0.7820822 > cat("linear regression MAPE:",linearMAPE) linear regression MAPE: 0.1092112 We probably desire a model with less MAPE than 30% if we re predicting strength of concrete to build bridges! Let s see if we can improve upon this MAPE using Neural Networks. For this tutorial, we ll use the neuralnet package. If you recall, neural networks work best when the input data is standardized to a narrow range near zero. Since our data takes positive values up to 1000, we ll standardize the data. Let s first check the distribution of the variables to see if statistical standardization seems appropriate. > # 3 figures arranged in 3 rows and 1 column > attach(train) > par(mfrow=c(4,1)) > hist(cement) > hist(slag) > hist(ash) > hist(water)

5 Histogram of cement 0 150 100 200 300 400 500 cement Histogram of slag 0 400 0 50 100 150 200 250 300 350 slag Histogram of ash 0 400 0 50 100 150 200 ash Histogram of water 0 200 120 140 160 180 200 220 240 water > par(mfrow=c(4,1)) > hist(superplastic) > hist(coarseagg) > hist(fineagg) > hist(age)

6 Histogram of superplastic 0 300 0 5 10 15 20 25 30 35 superplastic Histogram of coarseagg 0 200 800 850 900 950 1000 1050 1100 1150 coarseagg Histogram of fineagg 0 250 600 700 800 900 1000 fineagg Histogram of age 0 500 0 100 200 300 400 age While many of the variables have bell-shaped distributions, others, like age, do not. It likely won t affect the results too much whether we choose range standardization or statistical standardization in this case, so you might try both. For the purposes of our tutorial, we ll go with range standardization. This will allow us to practice writing a function as well! This normalize function applies to variable vectors, and we ll need to apply it to every column of our data frame - including the target! > scalerange = function(x) { + return((x-min(x))/(max(x)-min(x))) + } > concrete_norm = as.data.frame(lapply(concrete, scalerange)) > train_norm=concrete_norm[trainind, ] > valid_norm=concrete_norm[!trainind, ] Let s get started with something simple, a neural network with only one hidden unit, and see how it performs compared to the linear regression. You can then plot the resulting network structure with the associated weights using the plot() command. > library(neuralnet) > nnet1 = neuralnet(strength~cement+slag+ash+water+superplastic+coarseagg+fineagg+age, data=train_norm, hidden=1) > #plot(nnet1)

7 1 1 cement -0.25188 slag 0.25201 ash 0.23449 0.15115 water -0.52486-1.92744 superplastic -0.34626 0.3315 strength coarseagg 1.91669 fineagg -0.75697 age -0.9947 Error: 16.550649 Steps: 40 To predict the strength variable on the validation data, we ll need to use the compute() function of the neuralnet package. This function computes not only the final model output (in the $net.result component), but also the computed value at each neuron (in the $neurons component). We ll only want to retrieve the former. We ll use the predicted values to calculate a validation R 2 as well as a MAPE. > results1 = compute(nnet1, valid_norm[,1:8]) > nnet1pred=results1$net.result > nnet1rsq = cor(nnet1pred,valid_norm$strength) > nnet1mape = mean((nnet1pred-valid_norm$strength)/valid_norm$strength) > cat("1 Hidden Unit R-squared:", nnet1rsq) 1 Hidden Unit R-squared: 0.840282435 > cat("1 Hidden Unit MAPE:", nnet1mape)

8 1 Hidden Unit MAPE: 0.08960350585 The validation R 2 from our model increases dramatically to 0.83, and the MAPE calculation has a problem. It is important to rescale the data to accurately compare the predicted values with the actual values of concrete strength. This is the problem with the MAPE calculation, as we ve created 0 variables for strength in the observations that had the minimum strength. If those minimum strength observations are in our validation data, the MAPE will be undefined. To rescale our predictions to the original measure of strength, we need to multiply them by the range of the original variables and add in the minimum value. We will then recompute both the R 2 (which shouldn t change since we re only taking a linear transformation of our predictions) and the MAPE for the neural network model. > nnet1predrescale = nnet1pred*(max(concrete$strength)-min(concrete$strength))+min(concrete$strength) > nnet1rsq = cor(nnet1predrescale,valid$strength) > nnet1mape = mean((nnet1predrescale-valid$strength)/valid$strength) > cat("1 Hidden Unit R-squared after Rescaling:", nnet1rsq) 1 Hidden Unit R-squared after Rescaling: 0.840282435 > cat("1 Hidden Unit MAPE after Rescaling:", nnet1mape) 1 Hidden Unit MAPE after Rescaling: 0.06577119777 Our MAPE is starting to look like something we might have more confidence in! This was an very simple neural network model (a good place to start) so let s see how we do if we increase the number of hidden units first to 3 and then to 5: > nnet3 = neuralnet(strength~cement+slag+ash+water+superplastic+coarseagg+fineagg+age, data=train_norm, hidden=3) > nnet5 = neuralnet(strength~cement+slag+ash+water+superplastic+coarseagg+fineagg+age, data=train_norm, hidden=5) > results3 = compute(nnet3, valid_norm[,1:8]) > nnet3predrescale = results3$net.result*(max(concrete$strength)-min(concrete$strength))+min(concrete$strength) > nnet3rsq = cor(nnet3predrescale,valid$strength) > nnet3mape = mean((nnet3predrescale-valid$strength)/valid$strength) > cat("3 Hidden Unit R-squared after Rescaling:", nnet3rsq) 3 Hidden Unit R-squared after Rescaling: 0.9196445059 > cat("3 Hidden Unit MAPE after Rescaling:", nnet3mape) 3 Hidden Unit MAPE after Rescaling: 0.0395292331 > results5 = compute(nnet5, valid_norm[,1:8]) > nnet5predrescale = results5$net.result*(max(concrete$strength)-min(concrete$strength))+min(concrete$strength) > nnet5rsq = cor(nnet5predrescale,valid$strength) > nnet5mape = mean((nnet5predrescale-valid$strength)/valid$strength) > cat("5 Hidden Unit R-squared after Rescaling:", nnet5rsq) 5 Hidden Unit R-squared after Rescaling: 0.9353053032 > cat("5 Hidden Unit MAPE after Rescaling:", nnet5mape) 5 Hidden Unit MAPE after Rescaling: 0.01566389657