HW 10 STAT 472, Spring 2018

Size: px
Start display at page:

Download "HW 10 STAT 472, Spring 2018"

Transcription

1 HW 10 STAT 472, Spring ) (0 points) Do parts (a), (b), (c), and (e) of Exercise 2 on p. 298 of ISL. 2) (0 points) Do Exercise 3 on p. 298 of ISL. 3) For this problem, you can merely submit the things that I specifically request in the various parts, or you can submit some of your work in addition to the answers, but if you do that be sure to highlight the specific things I request in yellow. (Note: Just submitting the bare minimum won t allow you to earn much partial credit for incorrect answers. If you re unsure about something, it may be better to provide some of your R code. (Or better yet, ask me about any troublesome parts of the assignment.)) Attach the Auto data set from the ISLR library. With this exercise, we re going to first use some of the one predictor regression methods from Sections 7.1 through 7.6 in an attempt to explain miles per gallon using horsepower. But an examination of the scatter plot created using plot(mpg~horsepower) shows that we ll have appreciable heteroscedasticity if we use mpg as the response variable. (The variation in mpg generally increases as horsepower decreases.) So instead we ll use the inverse of the square root of mpg as the response variable, as we use horsepower as the sole predictor variable for the first portion of this assignment. (Note: I decided on this transformation of mpg by just trying a few things. By making a plot, you can see that not only does it make a constant error term variance assumption much more plausible, but it also creates a closer-to-linear relationship between the response and the predictor.) However, later in the assignment we ll also incorporate some additional predictor variables (displacement and weight), and so it ll be best to go ahead and include them as the training and test data are created. Furthermore, let s also load the glmnet, boot, splines, gam, and tree libraries, since eventually they ll be needed. (Note: You might have to first install the gam and tree libraries if you ve never used them, but you don t have to install splines before using it since it s part of the base installation.) Now let s create training and tests sets of our response and predictors as follows (being sure to set the seed of R s random number generator to 123 right before you create the train vector): library(glmnet) library(boot) library(splines) library(gam) library(tree) y=1/sqrt(mpg) set.seed(123) train = sample(392,292,replace=false) train.dat=data.frame(cbind(y[train],displacement[train],horsepower[train], +weight[train])) test.dat=data.frame(cbind(y[-train],displacement[-train],horsepower[-train], +weight[-train])) names(train.dat)=c("y","disp","hp","wt") names(test.dat)=c("y","disp","hp","wt") Note that I ve made the variable names y, disp, hp, and wt in order to make it easier to type in the various models we want to consider. In order to check things enter dim(train.dat) head(train.dat) dim(test.dat) head(test.dat) You should see that the dimension of train.dat is 292 by 4, and that the first 3 values of hp are 107, 60, and 105. You should also see that the dimension of test.dat is 100 by 4, and that the first 3 values of hp are 165, 150, and 140.

2 Now use the training data to fit a fourth-degree polynomial models having y as the reponse and hp as the predictor. Although there are a variety of ways that this can be done, please use poly4=lm(y~poly(hp,4,raw=t), data=train.dat) summary(poly4) (since using the above with raw=f gives us a version that s not explained well in the text (nor the videos), and it s an unnecessary complication that we don t need to bother with). (a) (1 point) What p-value results from the t test associated with the 4th-degree term in the model? (Round to the nearest thousandth (which may be indicating a bit too much accuracy, but the 3 digits will help me make sure that you ve done everything correctly up to this point). You should get a large p-value indicating that a 4th-order polynomial fit may not be necessary.) Now fit a third-order polynomial model using: poly3=lm(y~poly(hp,3,raw=t), data=train.dat) summary(poly3) (Note that R 2 did not decrease. You should see that the 3rd-order term has a small p-value, indicating that simplifying to a 2nd-order fit may not be good.) Now let s check to see if our test set MSPE estimates indicate that the the 3rd-order model is really superior to the 4th-order model. We can compute the estimated test MSPE for the 4th-order model as follows: pred.test=predict(poly4, newdata=test.dat) You should get a value of about (b) (1 point) Now give the estimated MSPE (based on the test data) for the 3rd-order model? (Report the value by rounding to 5 significant digits (so through the 8th digit after the decimal) so that I can confirm that you ve done things correctly. You should see that while it s only a very tiny bit smaller than the estimate obtained from the 4th-order model, the simpler polynomial model did predict better.) Now let s make a plot showing the 3rd-order polynomial fit, along with some standard error bands. This can be done by doing something similar to what is shown on the middle portion of p. 289 of the text, but I ll make a plot that is a bit less fancy as follows: hp.lims=range(train.dat$hp) hp.grid=seq(from=hp.lims[1], to=hp.lims[2]) preds=predict(poly3,newdata=list(hp=hp.grid),se=true) se.bands=cbind(preds$fit+2*preds$se.fit, preds$fit-2*preds$se.fit) lines(hp.grid,preds$fit,lwd=2,col="blue") matlines(hp.grid,se.bands,lwd=1,col="blue",lty=3) (c) (1 point) Use R to produce such a plot and submit a hard copy of it. (You don t have to use any color if you don t have a good way to print color plots, but don t submit a hand-drawn plot! (These guidelines apply to all of the other plots requested in this assignment.)) Now let s fit some spline models. Since our polynomial fits indicate that the cubic polynomial is better than the 4th-degree polynomial, it may be that we don t need a lot of knots to get a good fit, and so let s just use one knot located at 175. (Most of the curvature occurs upwards of 150, so one might be tempted to move the knot even higher. But since there s not a lot of data with values of hp greater than 175, it may be better not to move it any higher.) We can produce such a cubic spline fit, and plot it, as follows: cubspl=lm(y~bs(hp,knots=c(175)),data=train.dat) cs.pred=predict(cubspl,newdata=list(hp=hp.grid)) lines(hp.grid,cs.pred,lwd=2,col="blue") The fitted curve doesn t look much different from the one for part (c), except that it turns down more sharply at the extreme right. (Note: Based on what s in the top shaded box on p. 293 of ISL, one might think that I should use cs.pred$fit instead of just cs.pred in the last line above, but since I didn t include se=t (like the text did) when I used predict(), the way I did it is appropriate.) (d) (1 point) Now give the estimated MSPE, based on the test data, for the cubic spline model? (Report the value by rounding to 5 significant digits (so the 8th place after the decimal).)

3 For a natural spline, we can use a total of 5 knots and have the same number of parameters. But lets try using just 4 knots; two close to the ends of the range of hp values, at 70 and 210, and two more in the region where the curvature starts to become more pronounced, at 170 and 190. We can fit the spline and view the fit as follows: nspl=lm(y~ns(hp,knots=c(70,170,190,210)),data=train.dat) ns.pred=predict(nspl,newdata=list(hp=hp.grid)) lines(hp.grid,ns.pred,lwd=2,col="blue") (e) (1 point) Now give the estimated MSPE, based on the test data, for the natural spline model? (Again, round to 5 significant digits.) So now let s go to a smoothing spline fit, where we don t have to specify knot locations. First we ll let crossvalidation select a value for the smoothing parameter and determine the corresponding effective degrees of freedom, and then we ll plot the fit as follows: sspl=smooth.spline(train.dat$hp,train.dat$y,cv=true) sspl$df lines(sspl,lwd=2,col="blue") (f) (1 point) Use R to produce such a plot and submit a hard copy of it. This curve looks very different from the one we got using the natural spline with knots at 70, 170, 190, and 210. Unfortunately, we cannot estimate the MSPE in the usual way, since the predict() function seems to work different on the sspl object that was produced. Unlike the cases when we applied the predict() function to the polynomial, cubic spline, and natural spline objects, both pred.test=predict(sspl, newdata=test.dat)) and pred.test=predict(sspl, newdata=list(hp=test.dat$hp)) only produces 83 different values, instead of the 100 values we need to compare to the 100 y values in the test set. (83 is the number of unique values in train.dat$hp.) If one looks at the output of table(test.dat$hp) it can be seen that there are only 50 different values of hp in the test set. However, to estimate the MSPE using the test sample, we can do the following: pred.test=predict(sspl, x=test.dat$hp) mean((pred.test$y-test.dat$y)^2) (Note: I don t know why the syntax is different for the smoothing splines than it is for the other methods.) You should get a value of about (which is the worst performance we ve gotten so far... maybe because the smoothing spline fit doesn t curve down as much on the extreme right)). Now let s use the local regression function, loess() to estimate the mean response, and make predictions: locreg=loess(y~hp,span=.5,data=train.dat,degree=1) lo.pred=predict(locreg,data.frame(hp=hp.grid)) lines(hp.grid,lo.pred,lwd=2,col="blue") pred.test=predict(locreg,data.frame(hp=test.dat$hp)) (Note: I first tried using span=.2, but it produced a very wiggly fit!) Use the R code above to do the two parts below. (g) (1 point) Use R to produce a plot of the loess fit and submit a hard copy of it. (h) (1 point) Now give the estimated MSPE, based on the test data, that comes from using the loess fit to make predictions? (As before, round to 5 significant digits. It can be noted that the value is the smallest of all such MSPE values obtained so far with this data.) Just for fun, let s try the same thing, except that this time we ll use a 2nd-order fit for the local regressions. locreg2=loess(y~hp,span=.5,data=train.dat,degree=2) lo2.pred=predict(locreg2,data.frame(hp=hp.grid)) lines(hp.grid,lo2.pred,lwd=2,col="red")

4 pred.test=predict(locreg2,data.frame(hp=test.dat$hp)) You should get a value of about , which is the smallest estimated MSPE we ve obtained so far. (Note: Using degree=2 to produce local 2nd-order fits is the default for loess().) Now use the training data to fit a basic multiple regression model with y as the response, and using disp, hp, and wt as predictors. This can be done using fit1=lm(y~., data=train.dat) summary(fit1) mean((predict(fit1,test.dat)-test.dat$y)^2) (Note: We get a higher value for R 2 from this multiple regression fit than we did from the polynomial fits just using hp, but it can also be noted that the test MSPE is larger here than what we have for the 2nd-order loess fit based on just the single predictor hp.) An examination of a residual plot suggests a pretty good fit, however if you look at the scatter plot produced by plot(train.dat$disp,fit1$res) you can see that perhaps we need more than just a linear term for disp. As a first attempt at improvement, let s simply add a quadratic terms for disp. If you do this, creating the object fit2, and look at summary(fit2) you can see that disp went from being marginally significant in our initial model, to now being highly significant, along with its associated quadratic term. (i) (1 point) What is the test sample estimate of the test MSPE for the model containing 1st-order terms for hp and wt, and 1st-order and 2nd-order terms for disp? Please round to 5 significant digits. Now let s fit a GAM. As a first attempt, let s use smoothing spline representations for all three predictors. (This way we don t have to make decisions about knot placement.) If we use the rule of thumb that suggests that you can have 1 df for every 15 observations, we get that we can afford to use 19 df in all. Taking out 1 for the intercept, that leaves 18, So, to be a bit conservative, we ll use 5 for hp, 5 for wt, and 6 for disp (since it appears to be the predictor needing the largest adjustment for nonlinearity). Then we ll look at the plots we can produce and make a new assessment of the situation. So, enter the following: fit3=gam(y~s(disp,6)+s(hp,5)+s(wt,5), data=train.dat) par(mfrow=c(1,3)) plot(fit3, se=true, col="blue") One can see that the hp and wt contributions are at most just a little nonlinear, but that the disp contribution is very nonlinear. So let s cut down on the flexibility allowed for hp and wt, by changing the df for each one, and keep disp as is. fit4=gam(y~s(disp,6)+s(hp,4)+s(wt,3), data=train.dat) Now let s use the test sample to estimate the MSPE for our last GAM model to see if our guesses have been good ones. mean((predict(fit4,test.dat)-test.dat$y)^2) (j) (1 point) What is the test sample estimate of the test MSPE for the last GAM model (the one having the lower df for hp and wt)? (Round to 5 significant digits (and note that this is the smallest MSPE value so far).) We could try several more GAM models, possibly including some interaction terms, but let s move on. (Note: I tried a full 3rd-order linear model, having 19 df, fit with OLS, and got an estimated MSPE of So clearly our GAM did a better job than a more traditional approach. Somewhat oddly, the 3rd-order model made worse predictions than the 1st-order linear model, even though an F test indicated that 2nd-order and 3rd-oder terms were needed.) Now let s grow and examine a regression tree using the tree() function. fit5=tree(y~., data=train.dat) summary(fit5) par(mfrow=c(1,1)) plot(fit5) text(fit5,pretty=0)

5 If you enlarge the plot to be full screen, you can see that it s somewhat interesting: first splitting on disp, then splitting both branches formed on hp, and then splitting 3 of the 4 branches formed on wt, and then there are no further splits. (With so much symmetry in the tree, it doesn t suggest the presence of strong interactions.) Let s compute an estimate of the MSPE. mean((predict(fit5,test.dat)-test.dat$y)^2) Not horrible, considering that regression trees are generally not so competitive, and this one wasn t fine tuned. So now let s see if using cross-validation to select a right-sized tree will lead to an improvement. fit6=cv.tree(fit5) plot(fit6$size,fit6$dev,type="b") The plot indicates that the 7 node tree is best, and so I guess we re done! (Enlarge the plot to get a better look. You can also examine the contents of fit6$dev.) (k) (1 point) What is the test sample estimate of the test MSPE for the tree model of fit5? (Round to 5 significant digits.)

HW 10 STAT 672, Summer 2018

HW 10 STAT 672, Summer 2018 HW 10 STAT 672, Summer 2018 1) (0 points) Do parts (a), (b), (c), and (e) of Exercise 2 on p. 298 of ISL. 2) (0 points) Do Exercise 3 on p. 298 of ISL. 3) For this problem, try to use the 64 bit version

More information

Applied Statistics : Practical 9

Applied Statistics : Practical 9 Applied Statistics : Practical 9 This practical explores nonparametric regression and shows how to fit a simple additive model. The first item introduces the necessary R commands for nonparametric regression

More information

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Introduction In this lab you will work through Section 5.3 of ISL and record your code and results in an RMarkdown document. I have added section

More information

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value. Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING

More information

Moving Beyond Linearity

Moving Beyond Linearity Moving Beyond Linearity Basic non-linear models one input feature: polynomial regression step functions splines smoothing splines local regression. more features: generalized additive models. Polynomial

More information

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors

More information

How to Do Everything We Need to Do on a TI Calculator in Algebra 2 for Now (Unless Davies Forgot Something)

How to Do Everything We Need to Do on a TI Calculator in Algebra 2 for Now (Unless Davies Forgot Something) How to Do Everything We Need to Do on a TI Calculator in Algebra 2 for Now (Unless Davies Forgot Something) 10.01.17 Before you do anything, set up your calculator so that it won t get in your way. Basics:

More information

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below. Graphing in Excel featuring Excel 2007 1 A spreadsheet can be a powerful tool for analyzing and graphing data, but it works completely differently from the graphing calculator that you re used to. If you

More information

STAT 705 Introduction to generalized additive models

STAT 705 Introduction to generalized additive models STAT 705 Introduction to generalized additive models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Generalized additive models Consider a linear

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Non-Linear Regression Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Understanding the need for non-parametric regressions 2 Familiarizing with two common

More information

Economics Nonparametric Econometrics

Economics Nonparametric Econometrics Economics 217 - Nonparametric Econometrics Topics covered in this lecture Introduction to the nonparametric model The role of bandwidth Choice of smoothing function R commands for nonparametric models

More information

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017 CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2016

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2016 CPSC 340: Machine Learning and Data Mining Regularization Fall 2016 Assignment 2: Admin 2 late days to hand it in Friday, 3 for Monday. Assignment 3 is out. Due next Wednesday (so we can release solutions

More information

Section 18-1: Graphical Representation of Linear Equations and Functions

Section 18-1: Graphical Representation of Linear Equations and Functions Section 18-1: Graphical Representation of Linear Equations and Functions Prepare a table of solutions and locate the solutions on a coordinate system: f(x) = 2x 5 Learning Outcome 2 Write x + 3 = 5 as

More information

Exemplar for Internal Achievement Standard. Mathematics and Statistics Level 1

Exemplar for Internal Achievement Standard. Mathematics and Statistics Level 1 Exemplar for Internal Achievement Standard Mathematics and Statistics Level 1 This exemplar supports assessment against: Achievement Standard (2.2) Apply graphical methods in solving problems An annotated

More information

Table of Laplace Transforms

Table of Laplace Transforms Table of Laplace Transforms 1 1 2 3 4, p > -1 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Heaviside Function 27 28. Dirac Delta Function 29 30. 31 32. 1 33 34. 35 36. 37 Laplace Transforms

More information

Splines and penalized regression

Splines and penalized regression Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my Algebra course that I teach here at Lamar University, although I have to admit that it s been years since I last taught this course. At this point in my career I

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Math Lab 6: Powerful Fun with Power Series Representations of Functions Due noon Thu. Jan. 11 in class *note new due time, location for winter quarter

Math Lab 6: Powerful Fun with Power Series Representations of Functions Due noon Thu. Jan. 11 in class *note new due time, location for winter quarter Matter & Motion Winter 2017 18 Name: Math Lab 6: Powerful Fun with Power Series Representations of Functions Due noon Thu. Jan. 11 in class *note new due time, location for winter quarter Goals: 1. Practice

More information

Stat 4510/7510 Homework 6

Stat 4510/7510 Homework 6 Stat 4510/7510 1/11. Stat 4510/7510 Homework 6 Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details so that

More information

Principles of Algorithm Design

Principles of Algorithm Design Principles of Algorithm Design When you are trying to design an algorithm or a data structure, it s often hard to see how to accomplish the task. The following techniques can often be useful: 1. Experiment

More information

Graphs of Exponential

Graphs of Exponential Graphs of Exponential Functions By: OpenStaxCollege As we discussed in the previous section, exponential functions are used for many realworld applications such as finance, forensics, computer science,

More information

Lecture 17: Smoothing splines, Local Regression, and GAMs

Lecture 17: Smoothing splines, Local Regression, and GAMs Lecture 17: Smoothing splines, Local Regression, and GAMs Reading: Sections 7.5-7 STATS 202: Data mining and analysis November 6, 2017 1 / 24 Cubic splines Define a set of knots ξ 1 < ξ 2 < < ξ K. We want

More information

RESAMPLING METHODS. Chapter 05

RESAMPLING METHODS. Chapter 05 1 RESAMPLING METHODS Chapter 05 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold Cross Validation Bias-Variance Trade-off for k-fold Cross Validation Cross Validation

More information

Ingredients of Change: Nonlinear Models

Ingredients of Change: Nonlinear Models Chapter 2 Ingredients of Change: Nonlinear Models 2.1 Exponential Functions and Models As we begin to consider functions that are not linear, it is very important that you be able to draw scatter plots,

More information

Chapter01.fm Page 1 Monday, August 23, :52 PM. Part I of Change. The Mechanics. of Change

Chapter01.fm Page 1 Monday, August 23, :52 PM. Part I of Change. The Mechanics. of Change Chapter01.fm Page 1 Monday, August 23, 2004 1:52 PM Part I The Mechanics of Change The Mechanics of Change Chapter01.fm Page 2 Monday, August 23, 2004 1:52 PM Chapter01.fm Page 3 Monday, August 23, 2004

More information

BASIC LOESS, PBSPLINE & SPLINE

BASIC LOESS, PBSPLINE & SPLINE CURVES AND SPLINES DATA INTERPOLATION SGPLOT provides various methods for fitting smooth trends to scatterplot data LOESS An extension of LOWESS (Locally Weighted Scatterplot Smoothing), uses locally weighted

More information

An introduction to plotting data

An introduction to plotting data An introduction to plotting data Eric D. Black California Institute of Technology February 25, 2014 1 Introduction Plotting data is one of the essential skills every scientist must have. We use it on a

More information

Fractional. Design of Experiments. Overview. Scenario

Fractional. Design of Experiments. Overview. Scenario Design of Experiments Overview We are going to learn about DOEs. Specifically, you ll learn what a DOE is, as well as, what a key concept known as Confounding is all about. Finally, you ll learn what the

More information

Depth First Search A B C D E F G A B C 5 D E F 3 2 G 2 3

Depth First Search A B C D E F G A B C 5 D E F 3 2 G 2 3 Depth First Search A B C D E F G A 4 3 2 B 4 5 4 3 C 5 D 3 4 2 E 2 2 3 F 3 2 G 2 3 Minimum (Weight) Spanning Trees Let G be a graph with weights on the edges. We define the weight of any subgraph of G

More information

Divisibility Rules and Their Explanations

Divisibility Rules and Their Explanations Divisibility Rules and Their Explanations Increase Your Number Sense These divisibility rules apply to determining the divisibility of a positive integer (1, 2, 3, ) by another positive integer or 0 (although

More information

If Statements, For Loops, Functions

If Statements, For Loops, Functions Fundamentals of Programming If Statements, For Loops, Functions Table of Contents Hello World Types of Variables Integers and Floats String Boolean Relational Operators Lists Conditionals If and Else Statements

More information

9 R1 Get another piece of paper. We re going to have fun keeping track of (inaudible). Um How much time do you have? Are you getting tired?

9 R1 Get another piece of paper. We re going to have fun keeping track of (inaudible). Um How much time do you have? Are you getting tired? Page: 1 of 14 1 R1 And this is tell me what this is? 2 Stephanie x times y plus x times y or hm? 3 R1 What are you thinking? 4 Stephanie I don t know. 5 R1 Tell me what you re thinking. 6 Stephanie Well.

More information

Variables and Data Representation

Variables and Data Representation You will recall that a computer program is a set of instructions that tell a computer how to transform a given set of input into a specific output. Any program, procedural, event driven or object oriented

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information

Exercise 1: Introduction to Stata

Exercise 1: Introduction to Stata Exercise 1: Introduction to Stata New Stata Commands use describe summarize stem graph box histogram log on, off exit New Stata Commands Downloading Data from the Web I recommend that you use Internet

More information

How to import text files to Microsoft Excel 2016:

How to import text files to Microsoft Excel 2016: How to import text files to Microsoft Excel 2016: You would use these directions if you get a delimited text file from a government agency (or some other source). This might be tab-delimited, comma-delimited

More information

2014 Stat-Ease, Inc. All Rights Reserved.

2014 Stat-Ease, Inc. All Rights Reserved. What s New in Design-Expert version 9 Factorial split plots (Two-Level, Multilevel, Optimal) Definitive Screening and Single Factor designs Journal Feature Design layout Graph Columns Design Evaluation

More information

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann Forrest W. Young & Carla M. Bann THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA CB 3270 DAVIE HALL, CHAPEL HILL N.C., USA 27599-3270 VISUAL STATISTICS PROJECT WWW.VISUALSTATS.ORG

More information

Understanding Recursion

Understanding Recursion Understanding Recursion sk, rob and dbtucker (modified for CS 536 by kfisler) 2002-09-20 Writing a Recursive Function Can we write the factorial function in AFunExp? Well, we currently don t have multiplication,

More information

Math Fundamentals for Statistics (Math 52) Unit 3: Addition and Subtraction. Scott Fallstrom and Brent Pickett The How and Whys Guys.

Math Fundamentals for Statistics (Math 52) Unit 3: Addition and Subtraction. Scott Fallstrom and Brent Pickett The How and Whys Guys. Math Fundamentals for Statistics (Math 52) Unit 3: Addition and Subtraction Scott Fallstrom and Brent Pickett The How and Whys Guys Unit 3 Page 1 3.1: Place Value (Addition Preview) Our system is a base-ten,

More information

1 Standard Errors on Different Models

1 Standard Errors on Different Models 1 Standard Errors on Different Models Math 158, Spring 2018 Jo Hardin Regression Splines & Smoothing/Kernel Splines R code First we scrape some weather data from NOAA. The resulting data we will use is

More information

1

1 Zeros&asymptotes Example 1 In an early version of this activity I began with a sequence of simple examples (parabolas and cubics) working gradually up to the main idea. But now I think the best strategy

More information

One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test).

One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test). One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test). Suppose you have a one way design, and want to do an ANOVA, but discover that your data are seriously not normal? Just

More information

Model selection and validation 1: Cross-validation

Model selection and validation 1: Cross-validation Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the

More information

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. http://en.wikipedia.org/wiki/local_regression Local regression

More information

(Refer Slide Time: 1:27)

(Refer Slide Time: 1:27) Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 1 Introduction to Data Structures and Algorithms Welcome to data

More information

Moving Beyond Linearity

Moving Beyond Linearity Moving Beyond Linearity The truth is never linear! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! But often

More information

Using Microsoft Excel

Using Microsoft Excel Using Microsoft Excel Introduction This handout briefly outlines most of the basic uses and functions of Excel that we will be using in this course. Although Excel may be used for performing statistical

More information

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines A popular method for moving beyond linearity 2. Basis expansion and regularization 1 Idea: Augment the vector inputs x with additional variables which are transformation of x use linear models in this

More information

4.7 Approximate Integration

4.7 Approximate Integration 4.7 Approximate Integration Some anti-derivatives are difficult to impossible to find. For example, 1 0 e x2 dx or 1 1 1 + x3 dx We came across this situation back in calculus I when we introduced the

More information

How to use FSBforecast Excel add in for regression analysis

How to use FSBforecast Excel add in for regression analysis How to use FSBforecast Excel add in for regression analysis FSBforecast is an Excel add in for data analysis and regression that was developed here at the Fuqua School of Business over the last 3 years

More information

Lesson 16: More on Modeling Relationships with a Line

Lesson 16: More on Modeling Relationships with a Line Student Outcomes Students use the least squares line to predict values for a given data set. Students use residuals to evaluate the accuracy of predictions based on the least squares line. Lesson Notes

More information

Intro to Algorithms. Professor Kevin Gold

Intro to Algorithms. Professor Kevin Gold Intro to Algorithms Professor Kevin Gold What is an Algorithm? An algorithm is a procedure for producing outputs from inputs. A chocolate chip cookie recipe technically qualifies. An algorithm taught in

More information

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2) SOC6078 Advanced Statistics: 9. Generalized Additive Models Robert Andersen Department of Sociology University of Toronto Goals of the Lecture Introduce Additive Models Explain how they extend from simple

More information

Section 2.3: Simple Linear Regression: Predictions and Inference

Section 2.3: Simple Linear Regression: Predictions and Inference Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple

More information

Cross-Validation Alan Arnholt 3/22/2016

Cross-Validation Alan Arnholt 3/22/2016 Cross-Validation Alan Arnholt 3/22/2016 Note: Working definitions and graphs are taken from Ugarte, Militino, and Arnholt (2016) The Validation Set Approach The basic idea behind the validation set approach

More information

Note that ALL of these points are Intercepts(along an axis), something you should see often in later work.

Note that ALL of these points are Intercepts(along an axis), something you should see often in later work. SECTION 1.1: Plotting Coordinate Points on the X-Y Graph This should be a review subject, as it was covered in the prerequisite coursework. But as a reminder, and for practice, plot each of the following

More information

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018

Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Getting started with simulating data in R: some helpful functions and how to use them Ariel Muldoon August 28, 2018 Contents Overview 2 Generating random numbers 2 rnorm() to generate random numbers from

More information

The theory of the linear model 41. Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that

The theory of the linear model 41. Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that The theory of the linear model 41 Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that E(Y X) =X 0 b 0 0 the F-test statistic follows an F-distribution with (p p 0, n p) degrees

More information

February 2017 (1/20) 2 Piecewise Polynomial Interpolation 2.2 (Natural) Cubic Splines. MA378/531 Numerical Analysis II ( NA2 )

February 2017 (1/20) 2 Piecewise Polynomial Interpolation 2.2 (Natural) Cubic Splines. MA378/531 Numerical Analysis II ( NA2 ) f f f f f (/2).9.8.7.6.5.4.3.2. S Knots.7.6.5.4.3.2. 5 5.2.8.6.4.2 S Knots.2 5 5.9.8.7.6.5.4.3.2..9.8.7.6.5.4.3.2. S Knots 5 5 S Knots 5 5 5 5.35.3.25.2.5..5 5 5.6.5.4.3.2. 5 5 4 x 3 3.5 3 2.5 2.5.5 5

More information

CMPSCI 187: Programming With Data Structures. Lecture 5: Analysis of Algorithms Overview 16 September 2011

CMPSCI 187: Programming With Data Structures. Lecture 5: Analysis of Algorithms Overview 16 September 2011 CMPSCI 187: Programming With Data Structures Lecture 5: Analysis of Algorithms Overview 16 September 2011 Analysis of Algorithms Overview What is Analysis of Algorithms? L&C s Dishwashing Example Being

More information

Tutorial Four: Linear Regression

Tutorial Four: Linear Regression Tutorial Four: Linear Regression Imad Pasha Chris Agostino February 25, 2015 1 Introduction When looking at the results of experiments, it is critically important to be able to fit curves to scattered

More information

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects Patralekha Bhattacharya Thinkalytics The PDLREG procedure in SAS is used to fit a finite distributed lagged model to time series data

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 5.1 Introduction You should all know a few ways of sorting in O(n log n)

More information

How Your First Program Works

How Your First Program Works How Your First Program Works Section 2: How Your First Program Works How Programs Are Structured...19 Method Main ( )...21 How Programs Are Structured In Section 1, you typed in and ran your first program

More information

Spatial Interpolation & Geostatistics

Spatial Interpolation & Geostatistics (Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Lag Mean Distance between pairs of points 1 Tobler s Law All places are related, but nearby places are related more than distant places Corollary:

More information

Mar. 20 Math 2335 sec 001 Spring 2014

Mar. 20 Math 2335 sec 001 Spring 2014 Mar. 20 Math 2335 sec 001 Spring 2014 Chebyshev Polynomials Definition: For an integer n 0 define the function ( ) T n (x) = cos n cos 1 (x), 1 x 1. It can be shown that T n is a polynomial of degree n.

More information

CS 4349 Lecture October 18th, 2017

CS 4349 Lecture October 18th, 2017 CS 4349 Lecture October 18th, 2017 Main topics for #lecture include #minimum_spanning_trees. Prelude Homework 6 due today. Homework 7 due Wednesday, October 25th. Homework 7 has one normal homework problem.

More information

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric) Splines Patrick Breheny November 20 Patrick Breheny STA 621: Nonparametric Statistics 1/46 Introduction Introduction Problems with polynomial bases We are discussing ways to estimate the regression function

More information

MITOCW watch?v=w_-sx4vr53m

MITOCW watch?v=w_-sx4vr53m MITOCW watch?v=w_-sx4vr53m The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To

More information

GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute

GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute GENREG WHAT IS IT? The Generalized Regression platform was introduced in JMP Pro 11 and got much better in version

More information

Multiple Regression White paper

Multiple Regression White paper +44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms

More information

Ingredients of Change: Nonlinear Models & 2.1 Exponential Functions and Models

Ingredients of Change: Nonlinear Models & 2.1 Exponential Functions and Models Chapter 2 Ingredients of Change: Nonlinear Models & 2.1 Exponential Functions and Models As we consider models that are not linear, it is very important that you be able to use scatter plots, numerical

More information

Graphing with a Graphing Calculator

Graphing with a Graphing Calculator APPENDIX C Graphing with a Graphing Calculator A graphing calculator is a powerful tool for graphing equations and functions. In this appendix we give general guidelines to follow and common pitfalls to

More information

Why Use Graphs? Test Grade. Time Sleeping (Hrs) Time Sleeping (Hrs) Test Grade

Why Use Graphs? Test Grade. Time Sleeping (Hrs) Time Sleeping (Hrs) Test Grade Analyzing Graphs Why Use Graphs? It has once been said that a picture is worth a thousand words. This is very true in science. In science we deal with numbers, some times a great many numbers. These numbers,

More information

Algorithm Analysis. College of Computing & Information Technology King Abdulaziz University. CPCS-204 Data Structures I

Algorithm Analysis. College of Computing & Information Technology King Abdulaziz University. CPCS-204 Data Structures I Algorithm Analysis College of Computing & Information Technology King Abdulaziz University CPCS-204 Data Structures I Order Analysis Judging the Efficiency/Speed of an Algorithm Thus far, we ve looked

More information

Lecture 9: July 14, How to Think About Debugging

Lecture 9: July 14, How to Think About Debugging Lecture 9: July 14, 2011 How to Think About Debugging So, you wrote your program. And, guess what? It doesn t work. L Your program has a bug in it Somehow, you must track down the bug and fix it Need to

More information

Lecture 16: High-dimensional regression, non-linear regression

Lecture 16: High-dimensional regression, non-linear regression Lecture 16: High-dimensional regression, non-linear regression Reading: Sections 6.4, 7.1 STATS 202: Data mining and analysis November 3, 2017 1 / 17 High-dimensional regression Most of the methods we

More information

This is a simple example of how the lasso regression model works.

This is a simple example of how the lasso regression model works. 1 of 29 5/25/2016 11:26 AM This is a simple example of how the lasso regression model works. save.image("backup.rdata") rm(list=ls()) library("glmnet") ## Loading required package: Matrix ## ## Attaching

More information

Heuristic Evaluation of Covalence

Heuristic Evaluation of Covalence Heuristic Evaluation of Covalence Evaluator #A: Selina Her Evaluator #B: Ben-han Sung Evaluator #C: Giordano Jacuzzi 1. Problem Covalence is a concept-mapping tool that links images, text, and ideas to

More information

Spatial Interpolation - Geostatistics 4/3/2018

Spatial Interpolation - Geostatistics 4/3/2018 Spatial Interpolation - Geostatistics 4/3/201 (Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Distance between pairs of points Lag Mean Tobler s Law All places are related, but nearby places

More information

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data Introduction About this Document This manual was written by members of the Statistical Consulting Program as an introduction to SPSS 12.0. It is designed to assist new users in familiarizing themselves

More information

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo A Multiple-Line Fitting Algorithm Without Initialization Yan Guo Abstract: The commonest way to fit multiple lines is to use methods incorporate the EM algorithm. However, the EM algorithm dose not guarantee

More information

Notes based on: Data Mining for Business Intelligence

Notes based on: Data Mining for Business Intelligence Chapter 9 Classification and Regression Trees Roger Bohn April 2017 Notes based on: Data Mining for Business Intelligence 1 Shmueli, Patel & Bruce 2 3 II. Results and Interpretation There are 1183 auction

More information

Last time... Bias-Variance decomposition. This week

Last time... Bias-Variance decomposition. This week Machine learning, pattern recognition and statistical data modelling Lecture 4. Going nonlinear: basis expansions and splines Last time... Coryn Bailer-Jones linear regression methods for high dimensional

More information

Chapter 7. The Data Frame

Chapter 7. The Data Frame Chapter 7. The Data Frame The R equivalent of the spreadsheet. I. Introduction Most analytical work involves importing data from outside of R and carrying out various manipulations, tests, and visualizations.

More information

Knowledge Discovery and Data Mining

Knowledge Discovery and Data Mining Knowledge Discovery and Data Mining Basis Functions Tom Kelsey School of Computer Science University of St Andrews http://www.cs.st-andrews.ac.uk/~tom/ tom@cs.st-andrews.ac.uk Tom Kelsey ID5059-02-BF 2015-02-04

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

The problem we have now is called variable selection or perhaps model selection. There are several objectives.

The problem we have now is called variable selection or perhaps model selection. There are several objectives. STAT-UB.0103 NOTES for Wednesday 01.APR.04 One of the clues on the library data comes through the VIF values. These VIFs tell you to what extent a predictor is linearly dependent on other predictors. We

More information

Nonparametric Approaches to Regression

Nonparametric Approaches to Regression Nonparametric Approaches to Regression In traditional nonparametric regression, we assume very little about the functional form of the mean response function. In particular, we assume the model where m(xi)

More information

Investigative Skills Toolkit (Numeric) Student Task Sheet TI-Nspire Numeric Version

Investigative Skills Toolkit (Numeric) Student Task Sheet TI-Nspire Numeric Version Introduction Investigative Skills Toolkit (Numeric) Student Task Sheet TI-Nspire Numeric Version This activity will develop your ability to use the TI-Nspire Numeric Handheld to investigate and generalise

More information

Mr G s Java Jive. #11: Formatting Numbers

Mr G s Java Jive. #11: Formatting Numbers Mr G s Java Jive #11: Formatting Numbers Now that we ve started using double values, we re bound to run into the question of just how many decimal places we want to show. This where we get to deal with

More information

ME 261: Numerical Analysis Lecture-12: Numerical Interpolation

ME 261: Numerical Analysis Lecture-12: Numerical Interpolation 1 ME 261: Numerical Analysis Lecture-12: Numerical Interpolation Md. Tanver Hossain Department of Mechanical Engineering, BUET http://tantusher.buet.ac.bd 2 Inverse Interpolation Problem : Given a table

More information

Solution to Series 7

Solution to Series 7 Dr. Marcel Dettling Applied Statistical Regression AS 2015 Solution to Series 7 1. a) We begin the analysis by plotting histograms and barplots for all variables. > ## load data > load("customerwinback.rda")

More information

Teacher notes for Hi-Tech Brains. Activity to teach linear equations by Slope-Intercept

Teacher notes for Hi-Tech Brains. Activity to teach linear equations by Slope-Intercept AMME Inc. Applied Math Made Easy Teacher notes for Hi-Tech Brains Activity to teach linear equations by Slope-Intercept 14-Lab D High-Tech Brains 2-3 days This lab will reinforce the following three important

More information

Planting the Seeds Exploring Cubic Functions

Planting the Seeds Exploring Cubic Functions 295 Planting the Seeds Exploring Cubic Functions 4.1 LEARNING GOALS In this lesson, you will: Represent cubic functions using words, tables, equations, and graphs. Interpret the key characteristics of

More information