HW 10 STAT 672, Summer 2018

Size: px
Start display at page:

Download "HW 10 STAT 672, Summer 2018"

Transcription

1 HW 10 STAT 672, Summer ) (0 points) Do parts (a), (b), (c), and (e) of Exercise 2 on p. 298 of ISL. 2) (0 points) Do Exercise 3 on p. 298 of ISL. 3) For this problem, try to use the 64 bit version of R if possible. Otherwise, state clearly on your paper that you used the 32 bit version. You can merely submit the things that I specifically request in the various parts, or you can submit some of your work in addition to the answers, but if you do that be sure to highlight the specific things I request in yellow. (Note: Just submitting the bare minimum won t allow you to earn much partial credit for incorrect answers. If you re unsure about something, it may be better to provide some of your R code. (Or better yet, ask me about any troublesome parts of the assignment.)) Attach the Auto data set from the ISLR library. With this exercise, we re going to first use some of the one predictor regression methods from Sections 7.1 through 7.6 in an attempt to explain miles per gallon using horsepower. But an examination of the scatter plot created using plot(mpg~horsepower) shows that we ll have appreciable heteroscedasticity if we use mpg as the response variable. (The variation in mpg generally increases as horsepower decreases.) So instead we ll use the inverse of the square root of mpg as the response variable, as we use horsepower as the sole predictor variable for the first portion of this assignment. (Note: I decided on this transformation of mpg by just trying a few things. By making a plot, you can see that not only does it make a constant error term variance assumption much more plausible, but it also creates a closer-to-linear relationship between the response and the predictor.) Because later in the assignment we ll also incorporate some additional predictor variables (displacement and weight), and so it ll be best to go ahead and include them as the training and test data are created. Furthermore, let s also load the kknn, glmnet, boot, splines, gam, and tree libraries, since eventually they ll be needed. (Note: You might have to first install the gam and tree libraries if you ve never used them, but you don t have to install splines before using it since it s part of the base installation.) Now let s create training and tests sets of our response and predictors as follows (being sure to set the seed of R s random number generator to 123 right before you create the train vector): library(kknn) library(glmnet) library(boot) library(splines) library(gam) library(tree) y=1/sqrt(mpg) train = sample(392,292,replace=false) train.dat=data.frame(cbind(y[train],displacement[train],horsepower[train], weight[train])) test.dat=data.frame(cbind(y[-train],displacement[-train],horsepower[-train], weight[-train])) names(train.dat)=c("y","disp","hp","wt") names(test.dat)=c("y","disp","hp","wt") Note that I ve made the variable names y, disp, hp, and wt in order to make it easier to type in the various models we want to consider. In order to check things enter dim(train.dat) head(train.dat) dim(test.dat) head(test.dat) You should see that the dimension of train.dat is 292 by 4, and that the first 3 values of hp are 107, 60, and 105. You should also see that the dimension of test.dat is 100 by 4, and that the first 3 values of hp are 165, 150, and 140.

2 Now use the training data to fit a fourth-degree polynomial models having y as the reponse and hp as the predictor. Although there are a variety of ways that this can be done, please use poly4=lm(y~poly(hp,4,raw=t), data=train.dat) summary(poly4) (since using the above with raw=f gives us a version that s not explained well in the text (nor the videos), and it s an unnecessary complication that we don t need to bother with). (a) (1 point) What p-value results from the t test associated with the 4th-degree term in the model? (Round to the nearest thousandth (which may be indicating a bit too much accuracy, but the 3 digits will help me make sure that you ve done everything correctly up to this point). You should get a large p-value indicating that a 4th-order polynomial fit may not be necessary.) Now fit a third-order polynomial model using: poly3=lm(y~poly(hp,3,raw=t), data=train.dat) summary(poly3) (Note that R 2 did not decrease. You should see that the 3rd-order term has a small p-value, indicating that simplifying to a 2nd-order fit may not be good.) Now let s check to see if our test set MSPE estimates indicate that the the 3rd-order model is really superior to the 4th-order model. We can compute the estimated test MSPE for the 4th-order model as follows: pred.test=predict(poly4, newdata=test.dat) You should get a value of about (b) (1 point) Now give the estimated MSPE (based on the test data) for the 3rd-order model? (Report the value by rounding to 5 significant digits (so through the 8th digit after the decimal) so that I can confirm that you ve done things correctly. You should see that while it s only a very tiny bit smaller than the estimate obtained from the 4th-order model, the simpler polynomial model did predict better.) Now let s make a plot showing the 3rd-order polynomial fit, along with some standard error bands. This can be done by doing something similar to what is shown on the middle portion of p. 289 of the text, but I ll make a plot that is a bit less fancy as follows: hp.lims=range(train.dat$hp) hp.grid=seq(from=hp.lims[1], to=hp.lims[2]) preds=predict(poly3,newdata=list(hp=hp.grid),se=true) se.bands=cbind(preds$fit+2*preds$se.fit, preds$fit-2*preds$se.fit) lines(hp.grid,preds$fit,lwd=2,col="blue") matlines(hp.grid,se.bands,lwd=1,col="blue",lty=3) (c) (1 point) Use R to produce such a plot and submit a hard copy of it. (You don t have to use any color if you don t have a good way to print color plots, but don t submit a hand-drawn plot! (These guidelines apply to all of the other plots requested in this assignment.)) Now let s fit some spline models. Since our polynomial fits indicate that the cubic polynomial is better than the 4th-degree polynomial, it may be that we don t need a lot of knots to get a good fit, and so let s just use one knot located at 175. (Most of the curvature occurs upwards of 150, so one might be tempted to move the knot even higher. But since there s not a lot of data with values of hp greater than 175, it may be better not to move it any higher.) We can produce such a cubic spline fit, and plot it, as follows: cubspl=lm(y~bs(hp,knots=c(175)),data=train.dat) cs.pred=predict(cubspl,newdata=list(hp=hp.grid)) lines(hp.grid,cs.pred,lwd=2,col="blue") The fitted curve doesn t look much different from the one for part (c), except that it turns down more sharply at the extreme right. (Note: Based on what s in the top shaded box on p. 293 of ISL, one might think that I should use cs.pred$fit instead of just cs.pred in the last line above, but since I didn t include se=t (like the text did) when I used predict(), the way I did it is appropriate.) (d) (1 point) Now give the estimated MSPE, based on the test data, for the cubic spline model? (Report the value by rounding to 5 significant digits (so the 8th place after the decimal).)

3 For a natural spline, we can use a total of 5 knots and have the same number of parameters. But lets try using just 4 knots; two close to the ends of the range of hp values, at 70 and 210, and two more in the region where the curvature starts to become more pronounced, at 170 and 190. We can fit the spline and view the fit as follows: nspl=lm(y~ns(hp,knots=c(70,170,190,210)),data=train.dat) ns.pred=predict(nspl,newdata=list(hp=hp.grid)) lines(hp.grid,ns.pred,lwd=2,col="blue") (e) (1 point) Now give the estimated MSPE, based on the test data, for the natural spline model? (Again, round to 5 significant digits. (It can be noted that even though the estimated function perhaps seems to change slope a bit too much, this fit gives us the smallest estimated test MSPE so far.)) So now let s go to a smoothing spline fit, where we don t have to specify knot locations. First we ll let crossvalidation select a value for the smoothing parameter and determine the corresponding effective degrees of freedom, and then we ll plot the fit as follows: sspl=smooth.spline(train.dat$hp,train.dat$y,cv=true) sspl$df lines(sspl,lwd=2,col="blue") (f) (1 point) Use R to produce such a plot and submit a hard copy of it. This curve looks very different from the one we got using the natural spline with knots at 70, 170, 190, and 210. Unfortunately, we cannot estimate the MSPE in the usual way, since the predict() function seems to work different on the sspl object that was produced. Unlike the cases when we applied the predict() function to the polynomial, cubic spline, and natural spline objects, both pred.test=predict(sspl, newdata=test.dat)) and pred.test=predict(sspl, newdata=list(hp=test.dat$hp)) only produces 83 different values, instead of the 100 values we need to compare to the 100 y values in the test set. (83 is the number of unique values in train.dat$hp.) If one looks at the output of table(test.dat$hp) it can be seen that there are only 50 different values of hp in the test set. However, to estimate the MSPE using the test sample, we can do the following: pred.test=predict(sspl, x=test.dat$hp) mean((pred.test$y-test.dat$y)^2) (Note: I don t know why the syntax is different for the smoothing splines than it is for the other methods.) You should get a value of about (which is the worst performance we ve gotten so far... maybe because the smoothing spline fit doesn t curve down as much on the extreme right)). Now let s use the local regression function, loess() to estimate the mean response, and make predictions: locreg=loess(y~hp,span=.5,data=train.dat,degree=1) lo.pred=predict(locreg,data.frame(hp=hp.grid)) lines(hp.grid,lo.pred,lwd=2,col="blue") pred.test=predict(locreg,data.frame(hp=test.dat$hp)) (Note: I first tried using span=.2, but it produced a very wiggly fit!) Use the R code above to do the two parts below. (g) (1 point) Use R to produce a plot of the loess fit and submit a hard copy of it. (h) (1 point) Now give the estimated MSPE, based on the test data, that comes from using the loess fit (resulting from degree=1 and span=.5) to make predictions? (As before, round to 5 significant digits. (Oddly, the estimated MSPE that came from the wiggley fit based on span=.2 (that I tried, but didn t ask you to try) led to a smaller estimated MSPE.))

4 Just for fun, let s try the same thing, except that this time we ll use a 2nd-order fit for the local regressions. locreg2=loess(y~hp,span=.5,data=train.dat,degree=2) lo2.pred=predict(locreg2,data.frame(hp=hp.grid)) lines(hp.grid,lo2.pred,lwd=2,col="red") pred.test=predict(locreg2,data.frame(hp=test.dat$hp)) You should get a value of about , which is the smallest estimated MSPE we ve obtained so far. (Note: Using degree=2 to produce local 2nd-order fits is the default for loess().) Now use the training data to fit a basic multiple regression model with y as the response, and using disp, hp, and wt as predictors. This can be done using fit1=lm(y~., data=train.dat) summary(fit1) mean((predict(fit1,test.dat)-test.dat$y)^2) (Note: We get a higher value for R 2 from this multiple regression fit than we did from the polynomial fits just using hp, but it can also be noted that the test MSPE is larger here than what we have for the 2nd-order loess fit based on just the single predictor hp.) An examination of a residual plot suggests a pretty good fit, however if you look at the scatter plot produced by plot(train.dat$disp,fit1$res) you can see that perhaps we need more than just a linear term for disp. As a first attempt at improvement, let s simply add a quadratic terms for disp. If you do this, creating the object fit2, and look at summary(fit2) you can see that disp went from being marginally significant in our initial model, to now being highly significant, along with its associated quadratic term. (i) (1 point) What is the test sample estimate of the test MSPE for the model containing 1st-order terms for hp and wt, and 1st-order and 2nd-order terms for disp? Please round to 5 significant digits. (It can be noted that the value is the smallest of all such MSPE values obtained so far with this data.) Now let s fit a GAM. As a first attempt, let s use smoothing spline representations for all three predictors. (This way we don t have to make decisions about knot placement.) If we use the rule of thumb that suggests that you can have 1 df for every 15 observations, we get that we can afford to use 19 df in all. Taking out 1 for the intercept, that leaves 18, So, to be a bit conservative, we ll use 5 for hp, 5 for wt, and 6 for disp (since it appears to be the predictor needing the largest adjustment for nonlinearity). Then we ll look at the plots we can produce and make a new assessment of the situation. So, enter the following: fit3=gam(y~s(disp,6)+s(hp,5)+s(wt,5), data=train.dat) par(mfrow=c(1,3)) plot(fit3, se=true, col="blue") One can see that the hp and wt contributions are at most just a little nonlinear, but that the disp contribution is very nonlinear. So let s cut down on the flexibility allowed for hp and wt, by changing the df for each one, and keep disp as is. fit4=gam(y~s(disp,6)+s(hp,4)+s(wt,3), data=train.dat) Now let s use the test sample to estimate the MSPE for our last GAM model to see if our guesses have been good ones. mean((predict(fit4,test.dat)-test.dat$y)^2) (j) (1 point) What is the test sample estimate of the test MSPE for the last GAM model (the one having the lower df for hp and wt)? (Round to 5 significant digits (and note that this is the smallest MSPE value so far).) We could try several more GAM models, possibly including some interaction terms, but let s move on. (Note: I tried a full 3rd-order linear model, having 19 df, fit with OLS, and got an estimated MSPE of So clearly our GAM did a better job than a more traditional approach. Somewhat oddly, the 3rd-order model made worse predictions than the 1st-order linear model, even though F tests indicated that both 2nd-order and 3rd-oder terms were needed.)

5 Now let s try nearest neighbors regression, first using just hp as a predictor, modifying what s on p of the class notes as follows: train.out=train.kknn(y~hp,data=train.dat,kmax=40,distance=2,kernel="epanechnikov") par(mfrow=c(1,1)) plot(1:40, train.out$mean.squ) We can obtain the best value to use for K, and get the estimated MSPE (based on cross-validation) corresponding to this value of K as follows: which.min(train.out$mean.squ) train.out$mean.squ[which.min(train.out$mean.squ)] Then we can modify what s on p of the class notes in order to make predictions on the test set data, and see what the test set estimate of the MSPE is: kknn.out=kknn(y~hp,train=train.dat,test=test.dat,k=11,distance=2, kernel="epanechnikov") mean((test.dat$y - kknn.out$fitted.values)^2) Although we can see that the estimated MSPE from cross-validation is very close to the test set estimate of the MSPE, it should be noted that all of the other methods considered previously predicted better. (k) (3 points) Now do a multiple regression version: using hp, disp, and wt as predictors, but keeping everything else as above, use cross-validation to determine the best value for K, and then use this value of K, along with the training data, to make predictions for the test set. Then use these predicted values to estimate the MSPE for predicting future values of the response variable, and give this estimated MSPE, along with the value of K that was determined. Be sure to use before using train.kknn for this part! Now let s grow and examine a regression tree using the tree() function. fit5=tree(y~., data=train.dat) summary(fit5) plot(fit5) text(fit5,pretty=0) If you enlarge the plot to be full screen, you can see that it s somewhat interesting: first splitting on disp, then splitting both branches formed on hp, and then splitting 3 of the 4 branches formed on wt, and then there are no further splits. (With so much symmetry in the tree, it doesn t suggest the presence of strong interactions.) Let s compute an estimate of the MSPE. mean((predict(fit5,test.dat)-test.dat$y)^2) Not horrible, considering that regression trees are generally not so competitive, and this one wasn t fine tuned. (Note that our tree can only produce 7 different predicted values (one for each terminal node of the tree).) So now let s see if using cross-validation to select a right-sized tree will lead to an improvement. fit6=cv.tree(fit5) plot(fit6$size,fit6$dev,type="b") The plot indicates that the 6 node tree is slightly better than the 7 node tree. (Enlarge the plot to get a better look. You can also examine the contents of fit6$dev.) So now let s use the 6 node tree to get predictions, and estimate the test MSPE. pruned.tree=prune.tree(fit5, best=6) mean((predict(pruned.tree,test.dat)-test.dat$y)^2) (It can be noted that the test sample estimate of the test MSPE was slightly lower for the original 7 node tree (and so maybe cross-validation misled us).) (l) (2 points) What is the test sample estimate of the test MSPE for the pruned tree selected by crossvalidation? (Round to 5 significant digits.)

HW 10 STAT 472, Spring 2018

HW 10 STAT 472, Spring 2018 HW 10 STAT 472, Spring 2018 1) (0 points) Do parts (a), (b), (c), and (e) of Exercise 2 on p. 298 of ISL. 2) (0 points) Do Exercise 3 on p. 298 of ISL. 3) For this problem, you can merely submit the things

More information

Applied Statistics : Practical 9

Applied Statistics : Practical 9 Applied Statistics : Practical 9 This practical explores nonparametric regression and shows how to fit a simple additive model. The first item introduces the necessary R commands for nonparametric regression

More information

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value. Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING

More information

Moving Beyond Linearity

Moving Beyond Linearity Moving Beyond Linearity Basic non-linear models one input feature: polynomial regression step functions splines smoothing splines local regression. more features: generalized additive models. Polynomial

More information

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below. Graphing in Excel featuring Excel 2007 1 A spreadsheet can be a powerful tool for analyzing and graphing data, but it works completely differently from the graphing calculator that you re used to. If you

More information

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors

More information

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017

Lecture 27: Review. Reading: All chapters in ISLR. STATS 202: Data mining and analysis. December 6, 2017 Lecture 27: Review Reading: All chapters in ISLR. STATS 202: Data mining and analysis December 6, 2017 1 / 16 Final exam: Announcements Tuesday, December 12, 8:30-11:30 am, in the following rooms: Last

More information

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017

CPSC 340: Machine Learning and Data Mining. More Regularization Fall 2017 CPSC 340: Machine Learning and Data Mining More Regularization Fall 2017 Assignment 3: Admin Out soon, due Friday of next week. Midterm: You can view your exam during instructor office hours or after class

More information

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Introduction In this lab you will work through Section 5.3 of ISL and record your code and results in an RMarkdown document. I have added section

More information

How to Do Everything We Need to Do on a TI Calculator in Algebra 2 for Now (Unless Davies Forgot Something)

How to Do Everything We Need to Do on a TI Calculator in Algebra 2 for Now (Unless Davies Forgot Something) How to Do Everything We Need to Do on a TI Calculator in Algebra 2 for Now (Unless Davies Forgot Something) 10.01.17 Before you do anything, set up your calculator so that it won t get in your way. Basics:

More information

Section 18-1: Graphical Representation of Linear Equations and Functions

Section 18-1: Graphical Representation of Linear Equations and Functions Section 18-1: Graphical Representation of Linear Equations and Functions Prepare a table of solutions and locate the solutions on a coordinate system: f(x) = 2x 5 Learning Outcome 2 Write x + 3 = 5 as

More information

Economics Nonparametric Econometrics

Economics Nonparametric Econometrics Economics 217 - Nonparametric Econometrics Topics covered in this lecture Introduction to the nonparametric model The role of bandwidth Choice of smoothing function R commands for nonparametric models

More information

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Non-Linear Regression Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Understanding the need for non-parametric regressions 2 Familiarizing with two common

More information

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed. Preface Here are my online notes for my Algebra course that I teach here at Lamar University, although I have to admit that it s been years since I last taught this course. At this point in my career I

More information

STAT 705 Introduction to generalized additive models

STAT 705 Introduction to generalized additive models STAT 705 Introduction to generalized additive models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Generalized additive models Consider a linear

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2016

CPSC 340: Machine Learning and Data Mining. Regularization Fall 2016 CPSC 340: Machine Learning and Data Mining Regularization Fall 2016 Assignment 2: Admin 2 late days to hand it in Friday, 3 for Monday. Assignment 3 is out. Due next Wednesday (so we can release solutions

More information

Table of Laplace Transforms

Table of Laplace Transforms Table of Laplace Transforms 1 1 2 3 4, p > -1 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 Heaviside Function 27 28. Dirac Delta Function 29 30. 31 32. 1 33 34. 35 36. 37 Laplace Transforms

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Lecture 17: Smoothing splines, Local Regression, and GAMs

Lecture 17: Smoothing splines, Local Regression, and GAMs Lecture 17: Smoothing splines, Local Regression, and GAMs Reading: Sections 7.5-7 STATS 202: Data mining and analysis November 6, 2017 1 / 24 Cubic splines Define a set of knots ξ 1 < ξ 2 < < ξ K. We want

More information

Splines and penalized regression

Splines and penalized regression Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,

More information

RESAMPLING METHODS. Chapter 05

RESAMPLING METHODS. Chapter 05 1 RESAMPLING METHODS Chapter 05 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold Cross Validation Bias-Variance Trade-off for k-fold Cross Validation Cross Validation

More information

Understanding Recursion

Understanding Recursion Understanding Recursion sk, rob and dbtucker (modified for CS 536 by kfisler) 2002-09-20 Writing a Recursive Function Can we write the factorial function in AFunExp? Well, we currently don t have multiplication,

More information

Using Microsoft Excel

Using Microsoft Excel Using Microsoft Excel Introduction This handout briefly outlines most of the basic uses and functions of Excel that we will be using in this course. Although Excel may be used for performing statistical

More information

Ingredients of Change: Nonlinear Models

Ingredients of Change: Nonlinear Models Chapter 2 Ingredients of Change: Nonlinear Models 2.1 Exponential Functions and Models As we begin to consider functions that are not linear, it is very important that you be able to draw scatter plots,

More information

Exemplar for Internal Achievement Standard. Mathematics and Statistics Level 1

Exemplar for Internal Achievement Standard. Mathematics and Statistics Level 1 Exemplar for Internal Achievement Standard Mathematics and Statistics Level 1 This exemplar supports assessment against: Achievement Standard (2.2) Apply graphical methods in solving problems An annotated

More information

Chapter01.fm Page 1 Monday, August 23, :52 PM. Part I of Change. The Mechanics. of Change

Chapter01.fm Page 1 Monday, August 23, :52 PM. Part I of Change. The Mechanics. of Change Chapter01.fm Page 1 Monday, August 23, 2004 1:52 PM Part I The Mechanics of Change The Mechanics of Change Chapter01.fm Page 2 Monday, August 23, 2004 1:52 PM Chapter01.fm Page 3 Monday, August 23, 2004

More information

An introduction to plotting data

An introduction to plotting data An introduction to plotting data Eric D. Black California Institute of Technology February 25, 2014 1 Introduction Plotting data is one of the essential skills every scientist must have. We use it on a

More information

Intro to Algorithms. Professor Kevin Gold

Intro to Algorithms. Professor Kevin Gold Intro to Algorithms Professor Kevin Gold What is an Algorithm? An algorithm is a procedure for producing outputs from inputs. A chocolate chip cookie recipe technically qualifies. An algorithm taught in

More information

Graphs of Exponential

Graphs of Exponential Graphs of Exponential Functions By: OpenStaxCollege As we discussed in the previous section, exponential functions are used for many realworld applications such as finance, forensics, computer science,

More information

Math Lab 6: Powerful Fun with Power Series Representations of Functions Due noon Thu. Jan. 11 in class *note new due time, location for winter quarter

Math Lab 6: Powerful Fun with Power Series Representations of Functions Due noon Thu. Jan. 11 in class *note new due time, location for winter quarter Matter & Motion Winter 2017 18 Name: Math Lab 6: Powerful Fun with Power Series Representations of Functions Due noon Thu. Jan. 11 in class *note new due time, location for winter quarter Goals: 1. Practice

More information

If Statements, For Loops, Functions

If Statements, For Loops, Functions Fundamentals of Programming If Statements, For Loops, Functions Table of Contents Hello World Types of Variables Integers and Floats String Boolean Relational Operators Lists Conditionals If and Else Statements

More information

MITOCW watch?v=w_-sx4vr53m

MITOCW watch?v=w_-sx4vr53m MITOCW watch?v=w_-sx4vr53m The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high-quality educational resources for free. To

More information

Tree-based methods for classification and regression

Tree-based methods for classification and regression Tree-based methods for classification and regression Ryan Tibshirani Data Mining: 36-462/36-662 April 11 2013 Optional reading: ISL 8.1, ESL 9.2 1 Tree-based methods Tree-based based methods for predicting

More information

(Refer Slide Time: 1:27)

(Refer Slide Time: 1:27) Data Structures and Algorithms Dr. Naveen Garg Department of Computer Science and Engineering Indian Institute of Technology, Delhi Lecture 1 Introduction to Data Structures and Algorithms Welcome to data

More information

How to import text files to Microsoft Excel 2016:

How to import text files to Microsoft Excel 2016: How to import text files to Microsoft Excel 2016: You would use these directions if you get a delimited text file from a government agency (or some other source). This might be tab-delimited, comma-delimited

More information

BASIC LOESS, PBSPLINE & SPLINE

BASIC LOESS, PBSPLINE & SPLINE CURVES AND SPLINES DATA INTERPOLATION SGPLOT provides various methods for fitting smooth trends to scatterplot data LOESS An extension of LOWESS (Locally Weighted Scatterplot Smoothing), uses locally weighted

More information

Principles of Algorithm Design

Principles of Algorithm Design Principles of Algorithm Design When you are trying to design an algorithm or a data structure, it s often hard to see how to accomplish the task. The following techniques can often be useful: 1. Experiment

More information

One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test).

One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test). One way ANOVA when the data are not normally distributed (The Kruskal-Wallis test). Suppose you have a one way design, and want to do an ANOVA, but discover that your data are seriously not normal? Just

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 601.433/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Sorting lower bound and Linear-time sorting Date: 9/19/17 5.1 Introduction You should all know a few ways of sorting in O(n log n)

More information

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann Forrest W. Young & Carla M. Bann THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA CB 3270 DAVIE HALL, CHAPEL HILL N.C., USA 27599-3270 VISUAL STATISTICS PROJECT WWW.VISUALSTATS.ORG

More information

Depth First Search A B C D E F G A B C 5 D E F 3 2 G 2 3

Depth First Search A B C D E F G A B C 5 D E F 3 2 G 2 3 Depth First Search A B C D E F G A 4 3 2 B 4 5 4 3 C 5 D 3 4 2 E 2 2 3 F 3 2 G 2 3 Minimum (Weight) Spanning Trees Let G be a graph with weights on the edges. We define the weight of any subgraph of G

More information

2014 Stat-Ease, Inc. All Rights Reserved.

2014 Stat-Ease, Inc. All Rights Reserved. What s New in Design-Expert version 9 Factorial split plots (Two-Level, Multilevel, Optimal) Definitive Screening and Single Factor designs Journal Feature Design layout Graph Columns Design Evaluation

More information

Fractional. Design of Experiments. Overview. Scenario

Fractional. Design of Experiments. Overview. Scenario Design of Experiments Overview We are going to learn about DOEs. Specifically, you ll learn what a DOE is, as well as, what a key concept known as Confounding is all about. Finally, you ll learn what the

More information

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave.

LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. LOESS curve fitted to a population sampled from a sine wave with uniform noise added. The LOESS curve approximates the original sine wave. http://en.wikipedia.org/wiki/local_regression Local regression

More information

4.7 Approximate Integration

4.7 Approximate Integration 4.7 Approximate Integration Some anti-derivatives are difficult to impossible to find. For example, 1 0 e x2 dx or 1 1 1 + x3 dx We came across this situation back in calculus I when we introduced the

More information

Section 2.3: Simple Linear Regression: Predictions and Inference

Section 2.3: Simple Linear Regression: Predictions and Inference Section 2.3: Simple Linear Regression: Predictions and Inference Jared S. Murray The University of Texas at Austin McCombs School of Business Suggested reading: OpenIntro Statistics, Chapter 7.4 1 Simple

More information

Unit 4: Multiplication

Unit 4: Multiplication Math Fundamentals for Statistics I (Math 52) Unit 4: Multiplication By Scott Fallstrom and Brent Pickett The How and Whys Guys This work is licensed under a Creative Commons Attribution- NonCommercial-ShareAlike

More information

Math Fundamentals for Statistics (Math 52) Unit 3: Addition and Subtraction. Scott Fallstrom and Brent Pickett The How and Whys Guys.

Math Fundamentals for Statistics (Math 52) Unit 3: Addition and Subtraction. Scott Fallstrom and Brent Pickett The How and Whys Guys. Math Fundamentals for Statistics (Math 52) Unit 3: Addition and Subtraction Scott Fallstrom and Brent Pickett The How and Whys Guys Unit 3 Page 1 3.1: Place Value (Addition Preview) Our system is a base-ten,

More information

Model selection and validation 1: Cross-validation

Model selection and validation 1: Cross-validation Model selection and validation 1: Cross-validation Ryan Tibshirani Data Mining: 36-462/36-662 March 26 2013 Optional reading: ISL 2.2, 5.1, ESL 7.4, 7.10 1 Reminder: modern regression techniques Over the

More information

Divisibility Rules and Their Explanations

Divisibility Rules and Their Explanations Divisibility Rules and Their Explanations Increase Your Number Sense These divisibility rules apply to determining the divisibility of a positive integer (1, 2, 3, ) by another positive integer or 0 (although

More information

Stat 4510/7510 Homework 6

Stat 4510/7510 Homework 6 Stat 4510/7510 1/11. Stat 4510/7510 Homework 6 Instructions: Please list your name and student number clearly. In order to receive credit for a problem, your solution must show sufficient details so that

More information

1.7 Limit of a Function

1.7 Limit of a Function 1.7 Limit of a Function We will discuss the following in this section: 1. Limit Notation 2. Finding a it numerically 3. Right and Left Hand Limits 4. Infinite Limits Consider the following graph Notation:

More information

9 R1 Get another piece of paper. We re going to have fun keeping track of (inaudible). Um How much time do you have? Are you getting tired?

9 R1 Get another piece of paper. We re going to have fun keeping track of (inaudible). Um How much time do you have? Are you getting tired? Page: 1 of 14 1 R1 And this is tell me what this is? 2 Stephanie x times y plus x times y or hm? 3 R1 What are you thinking? 4 Stephanie I don t know. 5 R1 Tell me what you re thinking. 6 Stephanie Well.

More information

Algebra 2 Semester 1 (#2221)

Algebra 2 Semester 1 (#2221) Instructional Materials for WCSD Math Common Finals The Instructional Materials are for student and teacher use and are aligned to the 2016-2017 Course Guides for the following course: Algebra 2 Semester

More information

Note that ALL of these points are Intercepts(along an axis), something you should see often in later work.

Note that ALL of these points are Intercepts(along an axis), something you should see often in later work. SECTION 1.1: Plotting Coordinate Points on the X-Y Graph This should be a review subject, as it was covered in the prerequisite coursework. But as a reminder, and for practice, plot each of the following

More information

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2)

Goals of the Lecture. SOC6078 Advanced Statistics: 9. Generalized Additive Models. Limitations of the Multiple Nonparametric Models (2) SOC6078 Advanced Statistics: 9. Generalized Additive Models Robert Andersen Department of Sociology University of Toronto Goals of the Lecture Introduce Additive Models Explain how they extend from simple

More information

Tutorial Four: Linear Regression

Tutorial Four: Linear Regression Tutorial Four: Linear Regression Imad Pasha Chris Agostino February 25, 2015 1 Introduction When looking at the results of experiments, it is critically important to be able to fit curves to scattered

More information

Variables and Data Representation

Variables and Data Representation You will recall that a computer program is a set of instructions that tell a computer how to transform a given set of input into a specific output. Any program, procedural, event driven or object oriented

More information

Why Use Graphs? Test Grade. Time Sleeping (Hrs) Time Sleeping (Hrs) Test Grade

Why Use Graphs? Test Grade. Time Sleeping (Hrs) Time Sleeping (Hrs) Test Grade Analyzing Graphs Why Use Graphs? It has once been said that a picture is worth a thousand words. This is very true in science. In science we deal with numbers, some times a great many numbers. These numbers,

More information

Maximum and Minimum Slopes Wilfrid Laurier University

Maximum and Minimum Slopes Wilfrid Laurier University Maximum and Minimum Slopes Wilfrid Laurier University Wilfrid Laurier University December 12, 2014 In this document, you ll learn: In this document, you ll learn: how to determine the uncertainties in

More information

Multiple Regression White paper

Multiple Regression White paper +44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms

More information

Exercise 1: Introduction to Stata

Exercise 1: Introduction to Stata Exercise 1: Introduction to Stata New Stata Commands use describe summarize stem graph box histogram log on, off exit New Stata Commands Downloading Data from the Web I recommend that you use Internet

More information

This is a simple example of how the lasso regression model works.

This is a simple example of how the lasso regression model works. 1 of 29 5/25/2016 11:26 AM This is a simple example of how the lasso regression model works. save.image("backup.rdata") rm(list=ls()) library("glmnet") ## Loading required package: Matrix ## ## Attaching

More information

1

1 Zeros&asymptotes Example 1 In an early version of this activity I began with a sequence of simple examples (parabolas and cubics) working gradually up to the main idea. But now I think the best strategy

More information

How Your First Program Works

How Your First Program Works How Your First Program Works Section 2: How Your First Program Works How Programs Are Structured...19 Method Main ( )...21 How Programs Are Structured In Section 1, you typed in and ran your first program

More information

Investigative Skills Toolkit (Numeric) Student Task Sheet TI-Nspire Numeric Version

Investigative Skills Toolkit (Numeric) Student Task Sheet TI-Nspire Numeric Version Introduction Investigative Skills Toolkit (Numeric) Student Task Sheet TI-Nspire Numeric Version This activity will develop your ability to use the TI-Nspire Numeric Handheld to investigate and generalise

More information

Spatial Interpolation & Geostatistics

Spatial Interpolation & Geostatistics (Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Lag Mean Distance between pairs of points 1 Tobler s Law All places are related, but nearby places are related more than distant places Corollary:

More information

Roc Model and Density Dependence, Part 1

Roc Model and Density Dependence, Part 1 POPULATION MODELS Roc Model and Density Dependence, Part 1 Terri Donovan recorded: February, 2012 You ve now completed several modeling exercises dealing with the Roc population. So far, the caliph of

More information

Algorithm Analysis. College of Computing & Information Technology King Abdulaziz University. CPCS-204 Data Structures I

Algorithm Analysis. College of Computing & Information Technology King Abdulaziz University. CPCS-204 Data Structures I Algorithm Analysis College of Computing & Information Technology King Abdulaziz University CPCS-204 Data Structures I Order Analysis Judging the Efficiency/Speed of an Algorithm Thus far, we ve looked

More information

GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute

GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute GENREG DID THAT? Clay Barker Research Statistician Developer JMP Division, SAS Institute GENREG WHAT IS IT? The Generalized Regression platform was introduced in JMP Pro 11 and got much better in version

More information

Administrivia. Next Monday is Thanksgiving holiday. Tuesday and Wednesday the lab will be open for make-up labs. Lecture as usual on Thursday.

Administrivia. Next Monday is Thanksgiving holiday. Tuesday and Wednesday the lab will be open for make-up labs. Lecture as usual on Thursday. Administrivia Next Monday is Thanksgiving holiday. Tuesday and Wednesday the lab will be open for make-up labs. Lecture as usual on Thursday. Lab notebooks will be due the week after Thanksgiving, when

More information

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines

A popular method for moving beyond linearity. 2. Basis expansion and regularization 1. Examples of transformations. Piecewise-polynomials and splines A popular method for moving beyond linearity 2. Basis expansion and regularization 1 Idea: Augment the vector inputs x with additional variables which are transformation of x use linear models in this

More information

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015

15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015 15-451/651: Design & Analysis of Algorithms November 4, 2015 Lecture #18 last changed: November 22, 2015 While we have good algorithms for many optimization problems, the previous lecture showed that many

More information

Moving Beyond Linearity

Moving Beyond Linearity Moving Beyond Linearity The truth is never linear! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! 1/23 Moving Beyond Linearity The truth is never linear! r almost never! But often

More information

Linear and Quadratic Least Squares

Linear and Quadratic Least Squares Linear and Quadratic Least Squares Prepared by Stephanie Quintal, graduate student Dept. of Mathematical Sciences, UMass Lowell in collaboration with Marvin Stick Dept. of Mathematical Sciences, UMass

More information

Ingredients of Change: Nonlinear Models & 2.1 Exponential Functions and Models

Ingredients of Change: Nonlinear Models & 2.1 Exponential Functions and Models Chapter 2 Ingredients of Change: Nonlinear Models & 2.1 Exponential Functions and Models As we consider models that are not linear, it is very important that you be able to use scatter plots, numerical

More information

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo

A Multiple-Line Fitting Algorithm Without Initialization Yan Guo A Multiple-Line Fitting Algorithm Without Initialization Yan Guo Abstract: The commonest way to fit multiple lines is to use methods incorporate the EM algorithm. However, the EM algorithm dose not guarantee

More information

Lecture 9: July 14, How to Think About Debugging

Lecture 9: July 14, How to Think About Debugging Lecture 9: July 14, 2011 How to Think About Debugging So, you wrote your program. And, guess what? It doesn t work. L Your program has a bug in it Somehow, you must track down the bug and fix it Need to

More information

How to use FSBforecast Excel add in for regression analysis

How to use FSBforecast Excel add in for regression analysis How to use FSBforecast Excel add in for regression analysis FSBforecast is an Excel add in for data analysis and regression that was developed here at the Fuqua School of Business over the last 3 years

More information

Geology Geomath Estimating the coefficients of various Mathematical relationships in Geology

Geology Geomath Estimating the coefficients of various Mathematical relationships in Geology Geology 351 - Geomath Estimating the coefficients of various Mathematical relationships in Geology Throughout the semester you ve encountered a variety of mathematical relationships between various geologic

More information

1 Leaffix Scan, Rootfix Scan, Tree Size, and Depth

1 Leaffix Scan, Rootfix Scan, Tree Size, and Depth Lecture 17 Graph Contraction I: Tree Contraction Parallel and Sequential Data Structures and Algorithms, 15-210 (Spring 2012) Lectured by Kanat Tangwongsan March 20, 2012 In this lecture, we will explore

More information

Lab 9. Julia Janicki. Introduction

Lab 9. Julia Janicki. Introduction Lab 9 Julia Janicki Introduction My goal for this project is to map a general land cover in the area of Alexandria in Egypt using supervised classification, specifically the Maximum Likelihood and Support

More information

Intro. Scheme Basics. scm> 5 5. scm>

Intro. Scheme Basics. scm> 5 5. scm> Intro Let s take some time to talk about LISP. It stands for LISt Processing a way of coding using only lists! It sounds pretty radical, and it is. There are lots of cool things to know about LISP; if

More information

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric)

Splines. Patrick Breheny. November 20. Introduction Regression splines (parametric) Smoothing splines (nonparametric) Splines Patrick Breheny November 20 Patrick Breheny STA 621: Nonparametric Statistics 1/46 Introduction Introduction Problems with polynomial bases We are discussing ways to estimate the regression function

More information

Spatial Interpolation - Geostatistics 4/3/2018

Spatial Interpolation - Geostatistics 4/3/2018 Spatial Interpolation - Geostatistics 4/3/201 (Z i Z j ) 2 / 2 Spatial Interpolation & Geostatistics Lag Distance between pairs of points Lag Mean Tobler s Law All places are related, but nearby places

More information

Notes based on: Data Mining for Business Intelligence

Notes based on: Data Mining for Business Intelligence Chapter 9 Classification and Regression Trees Roger Bohn April 2017 Notes based on: Data Mining for Business Intelligence 1 Shmueli, Patel & Bruce 2 3 II. Results and Interpretation There are 1183 auction

More information

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17

/633 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/27/17 01.433/33 Introduction to Algorithms Lecturer: Michael Dinitz Topic: Priority Queues / Heaps Date: 9/2/1.1 Introduction In this lecture we ll talk about a useful abstraction, priority queues, which are

More information

Mr G s Java Jive. #11: Formatting Numbers

Mr G s Java Jive. #11: Formatting Numbers Mr G s Java Jive #11: Formatting Numbers Now that we ve started using double values, we re bound to run into the question of just how many decimal places we want to show. This where we get to deal with

More information

Heuristic Evaluation of Covalence

Heuristic Evaluation of Covalence Heuristic Evaluation of Covalence Evaluator #A: Selina Her Evaluator #B: Ben-han Sung Evaluator #C: Giordano Jacuzzi 1. Problem Covalence is a concept-mapping tool that links images, text, and ideas to

More information

Last time... Bias-Variance decomposition. This week

Last time... Bias-Variance decomposition. This week Machine learning, pattern recognition and statistical data modelling Lecture 4. Going nonlinear: basis expansions and splines Last time... Coryn Bailer-Jones linear regression methods for high dimensional

More information

Lesson 16: More on Modeling Relationships with a Line

Lesson 16: More on Modeling Relationships with a Line Student Outcomes Students use the least squares line to predict values for a given data set. Students use residuals to evaluate the accuracy of predictions based on the least squares line. Lesson Notes

More information

4. Java Project Design, Input Methods

4. Java Project Design, Input Methods 4-1 4. Java Project Design, Input Methods Review and Preview You should now be fairly comfortable with creating, compiling and running simple Java projects. In this class, we continue learning new Java

More information

February 2017 (1/20) 2 Piecewise Polynomial Interpolation 2.2 (Natural) Cubic Splines. MA378/531 Numerical Analysis II ( NA2 )

February 2017 (1/20) 2 Piecewise Polynomial Interpolation 2.2 (Natural) Cubic Splines. MA378/531 Numerical Analysis II ( NA2 ) f f f f f (/2).9.8.7.6.5.4.3.2. S Knots.7.6.5.4.3.2. 5 5.2.8.6.4.2 S Knots.2 5 5.9.8.7.6.5.4.3.2..9.8.7.6.5.4.3.2. S Knots 5 5 S Knots 5 5 5 5.35.3.25.2.5..5 5 5.6.5.4.3.2. 5 5 4 x 3 3.5 3 2.5 2.5.5 5

More information

Chapter 6: DESCRIPTIVE STATISTICS

Chapter 6: DESCRIPTIVE STATISTICS Chapter 6: DESCRIPTIVE STATISTICS Random Sampling Numerical Summaries Stem-n-Leaf plots Histograms, and Box plots Time Sequence Plots Normal Probability Plots Sections 6-1 to 6-5, and 6-7 Random Sampling

More information

Algorithms: Decision Trees

Algorithms: Decision Trees Algorithms: Decision Trees A small dataset: Miles Per Gallon Suppose we want to predict MPG From the UCI repository A Decision Stump Recursion Step Records in which cylinders = 4 Records in which cylinders

More information

The theory of the linear model 41. Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that

The theory of the linear model 41. Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that The theory of the linear model 41 Theorem 2.5. Under the strong assumptions A3 and A5 and the hypothesis that E(Y X) =X 0 b 0 0 the F-test statistic follows an F-distribution with (p p 0, n p) degrees

More information

Exercise: Graphing and Least Squares Fitting in Quattro Pro

Exercise: Graphing and Least Squares Fitting in Quattro Pro Chapter 5 Exercise: Graphing and Least Squares Fitting in Quattro Pro 5.1 Purpose The purpose of this experiment is to become familiar with using Quattro Pro to produce graphs and analyze graphical data.

More information

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data Introduction About this Document This manual was written by members of the Statistical Consulting Program as an introduction to SPSS 12.0. It is designed to assist new users in familiarizing themselves

More information

The problem we have now is called variable selection or perhaps model selection. There are several objectives.

The problem we have now is called variable selection or perhaps model selection. There are several objectives. STAT-UB.0103 NOTES for Wednesday 01.APR.04 One of the clues on the library data comes through the VIF values. These VIFs tell you to what extent a predictor is linearly dependent on other predictors. We

More information