Lab 6 More Linear Regression

Size: px
Start display at page:

Download "Lab 6 More Linear Regression"

Transcription

1 Lab 6 More Linear Regression Corrections from last lab 5: Last week we produced the following plot, using the code shown below. plot(sat$verbal, sat$math,, col=c(1,2)) legend("bottomright", legend=c("male", "Female"), pch=c(1,2), col=c(1,2)) This plot is incorrect for a few reasons. First, I recommend, when you have the need to distinguish your dots by categories (like Male, Female), pick one way, either changing symbols, or changing colors. We tried to do both, and ended up with 4 categories unknowingly (black/red circles, black/red triangles)!!! To choose different dot symbols, use argument pch=as.integer(sat$sex) To choose different dot colors, use col=as.integer(sat$sex) Next, R goes alphabetical in assigning symbols/colors to categories. My legend should have had the argument legend=c("female", "Male") instead of legend=c("male", "Female") This resulted in getting the symbols backwards!! Circles/black indicated Females, not Males, but I forced the legend to name the symbols backwards. R Tips: [1] Last week a student asked me how to tell R what to do when a data set variable (like V1) contains blank or 9999 symbols instead of NA or legal values for the variable that R sometimes gives error messages when trying to obtain plots or statistics when it encounters these cell anomalies in the variable. My general answer, given to me by a student of STAT 452, is to do the following, Say you have 3 variables, V1, V2, V3, which probably have bad/blank data values in them, and you want to correct that. Use the command na.omit(data.frame)v1, V2, V3)) or dataname <- na.omit(data.frame)v1, V2, V3)) if you want to name the new data frame constructed, and this will take care of these data values which mess up the R function you are trying to use. Be careful in your assignment symbol <-, Leave a space on either side of it, and put the 2 symbols together, in your code to avoid the following confusions. -1-

2 x < -5 (Is x less than -5?) x <-5 (is x assigned 5 or is x less than -5?) x < - 5 (is x less than or minus of 5?) etc... Transforming Variables: Last week we had the following scatter plot in our pairs() display, resulting from the STAT 216 data. # transforming variables stat216 <- read.csv("c:/users/michael/desktop/stat216.csv") plot(stat216$text, stat216$facebook, main="facebook friends vs TEXT per day") We will do a transformation of x, then y, then both, to see what results. # transforming variables logface <- log(stat216$facebook) ; logtext <- log(stat216$text) # note that log() finds the natural log, base e # log10() would find thr common log, base 10 plot(logtext, stat216$facebook, main="logging x variable") plot(stat216$text, logface, main="logging y variable") plot(logtext, logface, main="logging both variables") next -2-

3 As demonstrated, if you have many data points clustered in the bottom left (low values of x and y), and a few high values (of x, y, or both), then possibly a log transform is worth inspecting, in order to get the low points spread out along the x and y axes without moving the high end values very much from their relative positions. We see that logging both x and y gives us our best chance to model a linear fit to the data. Let's inspect our data. logtext ;logface We have to clean up the NA, NAN, and Inf cells in our results before we can properly model and graph. See below. So, forming data1 got rid of the NA and NAN. -3-

4 remove1 <- apply(data1,1, function(x) all(is.finite(x))) data2 <- data1[remove1,] data2 Doing this little routine above got rid of all Inf cells, resulting in data2, shown below. I found this little R routing online and posted the information in hint1.pdf in Moodle. See results below. We now have clean data to finish our regression analysis. See below. # make lm model now that NA, NAN, and Inf removed from data set model1 <- lm(data2$logface ~ data2$logtext) model1 summary(model1) scatter.smooth(data2$logtext,data2$logface, lwd=2, main="face on TEXT") abline(model1, lty=2, col="red", lwd=3) # make residual plot plot(predict(model1), resid(model1), main="residual plot of FACEBOOK=0.26TEXT ") abline(h=0) Results are shown below. -4-

5 Next We see that our line of best fit matches the smooth line well, and our residual plot seems to confirm that the linear model seems appropriate. So, our line of fit (model) is loge(face) = loge(text) So, if we have 100 text messages per day, we predict ln(face) = ln(100) = (4.6052) = so, FACE = e = Facebook friends. How accurate is this model? The multiple R-squared = r2 = from the summary() statement indicates an r = , a rather weak correlation. So, we do not have much confidence in our prediction, since our relationship between facebook friends and text messages/day is so weak. -5-

6 Another useful package: I have found the package UsingR to be very useful when doing linear regression problems because of the commands simple.lm() and plot(simple.lm()). See below for our use of this package in our problem. # install UsingR library(usingr) model2 <- simple.lm(data2$logtext, data2$logface) model2 Results are shown below. You get model coefficients and scatter plot with fit line superimposed. Next code. summary(model2) results in next code. plot(model2) results in 4 plots (residuals plot, normal Q-Q plot, standardized residuals root plot, and residuals vs leverage plot). Each of these has to be entered one at a time on the Console (see output below). -6-

7 Next Console queries (having you hit return key for each graph) is shown below. The Q-Q plot, standardized residuals plot, and leverage plot are tools used in more indepth regression analysis, which Dave will talk about sometime in his STAT sequence. We are mostly interested in the original scatter plot and residuals plot now. If you want 4 main plots to appear at one time without hitting ENTER type simple.lm(data2$logtext, data2$logface, show.residuals=true) to get -7-

8 along with display For now, these 4 graphs may be the most useful collection for your regression studies. Finally, we might be interested in producing prediction bands and 90% confidence interval of parameters bands. See below. par(fig=c(0,1,0,1)) simple.lm(data2$logtext, data2$logfac,show.ci=true,conf.level=0.90) with output and -8-

9 Note that the par() command used before plotting reset the plotting regions from 4 plots per window back to the default 1 plot per window, from the 4 plots per window which was automatically set up when we evoked simple.lm() we want to avoid using only ¼ of the window for future plotting, so reset the par() partitioning of graphics view. Back to the lm() command results: When you set up a model with lm(), you generate a lot of variables in the R background which you should be made aware of and possibly utilize for various purposes. See below for a list of many of these automatically generated variables summary() returns summary information about the regression plot() makes diagnostic plots coef() returns the coefficients residuals() returns the residuals (can be abbreviated resid()) fitted() returns fitted values, yi -predicted deviance() returns RSS predict() performs model predictions anova() finds various sums of squares AIC() is used for model selection Some of these topics will be discussed later in Dave's class. Let us use some of these variables to produce by hand a confidence interval/prediction interval band plot, which simple.lm() produced rather automatically above. See code below, using data set patients.txt (which contains various physical characteristics of patients). It is a nice linearly modeled data set with very strong linear relationship between height and weight. -9-

10 # prediction and CI bands # ======================= patients <- read.delim("c:/users/michael/desktop/patients.txt") patients height <- patients[,1] ; weight <- patients[,2] catheter <- patients[,3] model3 <- lm(weight ~ height) plot(height, weight, main="weight on height") abline(model3) xval <- data.frame(height=seq(20, 65,1)) conf <- predict(model3, xval, interval="confidence") pred <- predict(model3, xval, interval="prediction") matlines(xval, conf[,c("lwr", "upr")], col="red", lty=1, lwd=3) matlines(xval, pred[,c("lwr", "upr")], col="blue", lty=2, lwd=3) text(35, 80, labels="red solid is CI, blue dashed is prediction") summary(model3) conf pred Results are and

11 and Homework [1]: Perform a linear regression analysis on AIRFARE vs DISTANCE from the dataset Airfares.csv (no transformation needed). Be sure to include the scatter plot with line of fit, smooth line superimposed, model summary(), and statement of final linear model appropriate for the result, along with correlation values and other elements you deem necessary to this study. Homework [2]: Remove any point or two which seems not to fit in the model of HW[1], but rather appears as outlier(s), and rerun your linear study. Why might the point or two be legitimately removed from this listing of airline pricing per mile, since this pricing assumes equal pricing from city to city? Homework [3]: Look through the matrix of scatter plots from stat216.csv and pick 2 other variables to develop a linear regression model from transformed variables (like logging x or y or both), as I did above with FACE on TEXT. Be sure to produce a model summary(), scatter plot with line of best fit superimposed on it, and residual plot. Homework[4]: The data set cloudseed.csv contains rain generating data done in Texas to overcome a summer drought. Perform a transformation (hint: try log()) on variable(s) and produce an acceptable linear model of SEEDED vs UNSEEDED (SEEDED is response) cloud variable results (units unknown). Produce whatever statistics and graphics you deem acceptable for a report on the modeling for this problem. Extra Information not required for this lab: Transforming Variables

12 When we look at scatter plots to find regression model fits for y on x, we would like to model a linear fit, if appropriate. We can determine linearity as appropriate by looking at the scatter plot, or more appropriately, by looking at the residual plot. We could also introduce a curve of best fit (like some sort of a lowess line, etc.), to help us believe that a linear fit is at least not inappropriate. When a linear fit is not appropriate, we can transform the x, the y or both variables, in order to get some form of a line to fit the resulting transformed (x, y) points. We are not changing the data by transforming, if we use transformation functions which are called mathematically one-to-one and onto. It is like transforming a distance from feet to meters in units--we have not changed the data, just changed the description numbers describing the data. Below is a nice chart of ideas how to transform variables, depending upon what you originally have. To do transform #1, use the following commands and formula: y-hat = βo +b1 x + ε lm(y ~ sqrt(x)) plot(sqrt(x), y) y-hat = βo + β1ln(x) + ε lm(y ~ log(x)) plot(log(x), y) -1 y-hat = βo +β1 x + ε lm(y ~ 1/x) plot(1/x,y) -12-

13 To do transform #2, use the following commands and formula: y-hat = βo + β1x + β2x2 + ε lm(y ~ x+i(x^2)) plot(x, y) To do transform #4, use the following commands and formula: (y-hat)2 = βo + β1x + ε lm(i(y^2) ~ x) plot(x, y^2) To do transform #5, use the following commands and formula: ln(y-hat) = βo + β1x +ε lm(log(y) ~ x) ln(y-hat) = βo + β1ln(x) + ε lm(log(y) ~ log(x)) plot (x, log(y)) plot(log(x), log(y)) there are other ways to do these formulas, for example, by making up a variable consisting of log(x), sqrt(x), etc. and call it var1, using var 1 in the formula of the lm() and plot () commands, as needed. -13-

Problem set for Week 7 Linear models: Linear regression, multiple linear regression, ANOVA, ANCOVA

Problem set for Week 7 Linear models: Linear regression, multiple linear regression, ANOVA, ANCOVA ECL 290 Statistical Models in Ecology using R Problem set for Week 7 Linear models: Linear regression, multiple linear regression, ANOVA, ANCOVA Datasets in this problem set adapted from those provided

More information

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Your Name: Section: 36-201 INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression Objectives: 1. To learn how to interpret scatterplots. Specifically you will investigate, using

More information

How to use FSBforecast Excel add in for regression analysis

How to use FSBforecast Excel add in for regression analysis How to use FSBforecast Excel add in for regression analysis FSBforecast is an Excel add in for data analysis and regression that was developed here at the Fuqua School of Business over the last 3 years

More information

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version)

Practice in R. 1 Sivan s practice. 2 Hetroskadasticity. January 28, (pdf version) Practice in R January 28, 2010 (pdf version) 1 Sivan s practice Her practice file should be (here), or check the web for a more useful pointer. 2 Hetroskadasticity ˆ Let s make some hetroskadastic data:

More information

Discussion Notes 3 Stepwise Regression and Model Selection

Discussion Notes 3 Stepwise Regression and Model Selection Discussion Notes 3 Stepwise Regression and Model Selection Stepwise Regression There are many different commands for doing stepwise regression. Here we introduce the command step. There are many arguments

More information

36-402/608 HW #1 Solutions 1/21/2010

36-402/608 HW #1 Solutions 1/21/2010 36-402/608 HW #1 Solutions 1/21/2010 1. t-test (20 points) Use fullbumpus.r to set up the data from fullbumpus.txt (both at Blackboard/Assignments). For this problem, analyze the full dataset together

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming

More information

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables

More information

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26

BIO 360: Vertebrate Physiology Lab 9: Graphing in Excel. Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 Lab 9: Graphing: how, why, when, and what does it mean? Due 3/26 INTRODUCTION Graphs are one of the most important aspects of data analysis and presentation of your of data. They are visual representations

More information

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem. STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,

More information

Tips and Guidance for Analyzing Data. Executive Summary

Tips and Guidance for Analyzing Data. Executive Summary Tips and Guidance for Analyzing Data Executive Summary This document has information and suggestions about three things: 1) how to quickly do a preliminary analysis of time-series data; 2) key things to

More information

Here is the data collected.

Here is the data collected. Introduction to Scientific Analysis of Data Using Spreadsheets. Computer spreadsheets are very powerful tools that are widely used in Business, Science, and Engineering to perform calculations and record,

More information

R Exercise Practical 2. Follow the instructions described in this sheet. Type or paste any answers you are asked for into a Microsoft Word document.

R Exercise Practical 2. Follow the instructions described in this sheet. Type or paste any answers you are asked for into a Microsoft Word document. R Exercise Practical 2 Follow the instructions described in this sheet. Type or paste any answers you are asked for into a Microsoft Word document. Note: You should have this sheet on your computer for

More information

1. What specialist uses information obtained from bones to help police solve crimes?

1. What specialist uses information obtained from bones to help police solve crimes? Mathematics: Modeling Our World Unit 4: PREDICTION HANDOUT VIDEO VIEWING GUIDE H4.1 1. What specialist uses information obtained from bones to help police solve crimes? 2.What are some things that can

More information

Data Analysis Multiple Regression

Data Analysis Multiple Regression Introduction Visual-XSel 14.0 is both, a powerful software to create a DoE (Design of Experiment) as well as to evaluate the results, or historical data. After starting the software, the main guide shows

More information

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data EXERCISE Using Excel for Graphical Analysis of Data Introduction In several upcoming experiments, a primary goal will be to determine the mathematical relationship between two variable physical parameters.

More information

ME 142 Engineering Computation I. Graphing with Excel

ME 142 Engineering Computation I. Graphing with Excel ME 142 Engineering Computation I Graphing with Excel Common Questions from Unit 1.2 HW 1.2.2 See 1.2.2 Homework Exercise Hints video Use ATAN to find nominal angle in each quadrant Use the AND logical

More information

Using Excel for Graphical Analysis of Data

Using Excel for Graphical Analysis of Data Using Excel for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable physical parameters. Graphs are

More information

Here is Kellogg s custom menu for their core statistics class, which can be loaded by typing the do statement shown in the command window at the very

Here is Kellogg s custom menu for their core statistics class, which can be loaded by typing the do statement shown in the command window at the very Here is Kellogg s custom menu for their core statistics class, which can be loaded by typing the do statement shown in the command window at the very bottom of the screen: 4 The univariate statistics command

More information

SLStats.notebook. January 12, Statistics:

SLStats.notebook. January 12, Statistics: Statistics: 1 2 3 Ways to display data: 4 generic arithmetic mean sample 14A: Opener, #3,4 (Vocabulary, histograms, frequency tables, stem and leaf) 14B.1: #3,5,8,9,11,12,14,15,16 (Mean, median, mode,

More information

Lecture 25: Review I

Lecture 25: Review I Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,

More information

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business Section 3.4: Diagnostics and Transformations Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Regression Model Assumptions Y i = β 0 + β 1 X i + ɛ Recall the key assumptions

More information

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below. Graphing in Excel featuring Excel 2007 1 A spreadsheet can be a powerful tool for analyzing and graphing data, but it works completely differently from the graphing calculator that you re used to. If you

More information

DoE with Visual-XSel 13.0

DoE with Visual-XSel 13.0 Introduction Visual-XSel 13.0 is both, a powerful software to create a DoE (Design of Experiment) as well as to evaluate the results, or historical data. After starting the software, the main guide shows

More information

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Minitab 17 commands Prepared by Jeffrey S. Simonoff Minitab 17 commands Prepared by Jeffrey S. Simonoff Data entry and manipulation To enter data by hand, click on the Worksheet window, and enter the values in as you would in any spreadsheet. To then save

More information

Lab 1 (fall, 2017) Introduction to R and R Studio

Lab 1 (fall, 2017) Introduction to R and R Studio Lab 1 (fall, 201) Introduction to R and R Studio Introduction: Today we will use R, as presented in the R Studio environment (or front end), in an introductory setting. We will make some calculations,

More information

VCEasy VISUAL FURTHER MATHS. Overview

VCEasy VISUAL FURTHER MATHS. Overview VCEasy VISUAL FURTHER MATHS Overview This booklet is a visual overview of the knowledge required for the VCE Year 12 Further Maths examination.! This booklet does not replace any existing resources that

More information

ST Lab 1 - The basics of SAS

ST Lab 1 - The basics of SAS ST 512 - Lab 1 - The basics of SAS What is SAS? SAS is a programming language based in C. For the most part SAS works in procedures called proc s. For instance, to do a correlation analysis there is proc

More information

TI-84 GRAPHING CALCULATOR

TI-84 GRAPHING CALCULATOR TI-84 GRAPHING CALCULATOR Table of Contents Set Up & Troubleshooting... 3 TI-84: Resetting the Calculator... 4 TI-84: Mode Settings... 5 Entering Data... 7 TI-84: Entering & Editing Lists of Data... 8

More information

Excel Spreadsheets and Graphs

Excel Spreadsheets and Graphs Excel Spreadsheets and Graphs Spreadsheets are useful for making tables and graphs and for doing repeated calculations on a set of data. A blank spreadsheet consists of a number of cells (just blank spaces

More information

Logical operators: R provides an extensive list of logical operators. These include

Logical operators: R provides an extensive list of logical operators. These include meat.r: Explanation of code Goals of code: Analyzing a subset of data Creating data frames with specified X values Calculating confidence and prediction intervals Lists and matrices Only printing a few

More information

Chapter 4: Analyzing Bivariate Data with Fathom

Chapter 4: Analyzing Bivariate Data with Fathom Chapter 4: Analyzing Bivariate Data with Fathom Summary: Building from ideas introduced in Chapter 3, teachers continue to analyze automobile data using Fathom to look for relationships between two quantitative

More information

Regression on the trees data with R

Regression on the trees data with R > trees Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7 7 11.0 66 15.6 8 11.0 75 18.2 9 11.1 80 22.6 10 11.2 75 19.9 11 11.3 79 24.2 12 11.4 76

More information

2011 Excellence in Mathematics Contest Team Project Level II (Below Precalculus) School Name: Group Members:

2011 Excellence in Mathematics Contest Team Project Level II (Below Precalculus) School Name: Group Members: 011 Excellence in Mathematics Contest Team Project Level II (Below Precalculus) School Name: Group Members: Reference Sheet Formulas and Facts You may need to use some of the following formulas and facts

More information

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems

RSM Split-Plot Designs & Diagnostics Solve Real-World Problems RSM Split-Plot Designs & Diagnostics Solve Real-World Problems Shari Kraber Pat Whitcomb Martin Bezener Stat-Ease, Inc. Stat-Ease, Inc. Stat-Ease, Inc. 221 E. Hennepin Ave. 221 E. Hennepin Ave. 221 E.

More information

Dealing with Data in Excel 2013/2016

Dealing with Data in Excel 2013/2016 Dealing with Data in Excel 2013/2016 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for

More information

Project 11 Graphs (Using MS Excel Version )

Project 11 Graphs (Using MS Excel Version ) Project 11 Graphs (Using MS Excel Version 2007-10) Purpose: To review the types of graphs, and use MS Excel 2010 to create them from a dataset. Outline: You will be provided with several datasets and will

More information

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression analysis. Analysis of Variance: one way classification,

More information

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel

Non-Linear Regression. Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Non-Linear Regression Business Analytics Practice Winter Term 2015/16 Stefan Feuerriegel Today s Lecture Objectives 1 Understanding the need for non-parametric regressions 2 Familiarizing with two common

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

Year 10 General Mathematics Unit 2

Year 10 General Mathematics Unit 2 Year 11 General Maths Year 10 General Mathematics Unit 2 Bivariate Data Chapter 4 Chapter Four 1 st Edition 2 nd Edition 2013 4A 1, 2, 3, 4, 6, 7, 8, 9, 10, 11 1, 2, 3, 4, 6, 7, 8, 9, 10, 11 2F (FM) 1,

More information

CHAPTER 6. The Normal Probability Distribution

CHAPTER 6. The Normal Probability Distribution The Normal Probability Distribution CHAPTER 6 The normal probability distribution is the most widely used distribution in statistics as many statistical procedures are built around it. The central limit

More information

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes. Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department

More information

9.8 Rockin the Residuals

9.8 Rockin the Residuals 42 SECONDARY MATH 1 // MODULE 9 9.8 Rockin the Residuals A Solidify Understanding Task The correlation coefficient is not the only tool that statisticians use to analyze whether or not a line is a good

More information

Things to Know for the Algebra I Regents

Things to Know for the Algebra I Regents Types of Numbers: Real Number: any number you can think of (integers, rational, irrational) Imaginary Number: square root of a negative number Integers: whole numbers (positive, negative, zero) Things

More information

Statistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings

Statistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings Statistical Good Practice Guidelines SSC home Using Excel for Statistics - Tips and Warnings On-line version 2 - March 2001 This is one in a series of guides for research and support staff involved in

More information

You are to turn in the following three graphs at the beginning of class on Wednesday, January 21.

You are to turn in the following three graphs at the beginning of class on Wednesday, January 21. Computer Tools for Data Analysis & Presentation Graphs All public machines on campus are now equipped with Word 2010 and Excel 2010. Although fancier graphical and statistical analysis programs exist,

More information

8. MINITAB COMMANDS WEEK-BY-WEEK

8. MINITAB COMMANDS WEEK-BY-WEEK 8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are

More information

How to use Excel Spreadsheets for Graphing

How to use Excel Spreadsheets for Graphing How to use Excel Spreadsheets for Graphing 1. Click on the Excel Program on the Desktop 2. You will notice that a screen similar to the above screen comes up. A spreadsheet is divided into Columns (A,

More information

Logistic Regression. (Dichotomous predicted variable) Tim Frasier

Logistic Regression. (Dichotomous predicted variable) Tim Frasier Logistic Regression (Dichotomous predicted variable) Tim Frasier Copyright Tim Frasier This work is licensed under the Creative Commons Attribution 4.0 International license. Click here for more information.

More information

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression

EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression EXST 7014, Lab 1: Review of R Programming Basics and Simple Linear Regression OBJECTIVES 1. Prepare a scatter plot of the dependent variable on the independent variable 2. Do a simple linear regression

More information

Multiple Regression White paper

Multiple Regression White paper +44 (0) 333 666 7366 Multiple Regression White paper A tool to determine the impact in analysing the effectiveness of advertising spend. Multiple Regression In order to establish if the advertising mechanisms

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use

More information

Data Analysis Guidelines

Data Analysis Guidelines Data Analysis Guidelines DESCRIPTIVE STATISTICS Standard Deviation Standard deviation is a calculated value that describes the variation (or spread) of values in a data set. It is calculated using a formula

More information

Selecting the Right Model Studio - Excel 2007 Version

Selecting the Right Model Studio - Excel 2007 Version Name Recitation Selecting the Right Model Studio - Excel 2007 Version We have seen linear and quadratic models for various data sets. However, once one collects data it is not always clear what model to

More information

PR3 & PR4 CBR Activities Using EasyData for CBL/CBR Apps

PR3 & PR4 CBR Activities Using EasyData for CBL/CBR Apps Summer 2006 I2T2 Process Page 23. PR3 & PR4 CBR Activities Using EasyData for CBL/CBR Apps The TI Exploration Series for CBR or CBL/CBR books, are all written for the old CBL/CBR Application. Now we can

More information

Nina Zumel and John Mount Win-Vector LLC

Nina Zumel and John Mount Win-Vector LLC SUPERVISED LEARNING IN R: REGRESSION Evaluating a model graphically Nina Zumel and John Mount Win-Vector LLC "line of perfect prediction" Systematic errors DataCamp Plotting Ground Truth vs. Predictions

More information

Practical 1P1 Computing Exercise

Practical 1P1 Computing Exercise Practical 1P1 Computing Exercise What you should learn from this exercise How to use the teaching lab computers and printers. How to use a spreadsheet for basic data analysis. How to embed Excel tables

More information

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked

Points Lines Connected points X-Y Scatter. X-Y Matrix Star Plot Histogram Box Plot. Bar Group Bar Stacked H-Bar Grouped H-Bar Stacked Plotting Menu: QCExpert Plotting Module graphs offers various tools for visualization of uni- and multivariate data. Settings and options in different types of graphs allow for modifications and customizations

More information

Math 121 Project 4: Graphs

Math 121 Project 4: Graphs Math 121 Project 4: Graphs Purpose: To review the types of graphs, and use MS Excel to create them from a dataset. Outline: You will be provided with several datasets and will use MS Excel to create graphs.

More information

Modesto City Schools. Secondary Math I. Module 1 Extra Help & Examples. Compiled by: Rubalcava, Christina

Modesto City Schools. Secondary Math I. Module 1 Extra Help & Examples. Compiled by: Rubalcava, Christina Modesto City Schools Secondary Math I Module 1 Extra Help & Examples Compiled by: Rubalcava, Christina 1.1 Ready, Set, Go! Ready Topic: Recognizing a solution to an equation. The solution to an equation

More information

Fathom Dynamic Data TM Version 2 Specifications

Fathom Dynamic Data TM Version 2 Specifications Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other

More information

Advanced Econometric Methods EMET3011/8014

Advanced Econometric Methods EMET3011/8014 Advanced Econometric Methods EMET3011/8014 Lecture 2 John Stachurski Semester 1, 2011 Announcements Missed first lecture? See www.johnstachurski.net/emet Weekly download of course notes First computer

More information

Exercise 2.23 Villanova MAT 8406 September 7, 2015

Exercise 2.23 Villanova MAT 8406 September 7, 2015 Exercise 2.23 Villanova MAT 8406 September 7, 2015 Step 1: Understand the Question Consider the simple linear regression model y = 50 + 10x + ε where ε is NID(0, 16). Suppose that n = 20 pairs of observations

More information

predict and Friends: Common Methods for Predictive Models in R , Spring 2015 Handout No. 1, 25 January 2015

predict and Friends: Common Methods for Predictive Models in R , Spring 2015 Handout No. 1, 25 January 2015 predict and Friends: Common Methods for Predictive Models in R 36-402, Spring 2015 Handout No. 1, 25 January 2015 R has lots of functions for working with different sort of predictive models. This handout

More information

Week 4: Simple Linear Regression III

Week 4: Simple Linear Regression III Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of

More information

Nina Zumel and John Mount Win-Vector LLC

Nina Zumel and John Mount Win-Vector LLC SUPERVISED LEARNING IN R: REGRESSION Logistic regression to predict probabilities Nina Zumel and John Mount Win-Vector LLC Predicting Probabilities Predicting whether an event occurs (yes/no): classification

More information

LAB 2: DATA FILTERING AND NOISE REDUCTION

LAB 2: DATA FILTERING AND NOISE REDUCTION NAME: LAB TIME: LAB 2: DATA FILTERING AND NOISE REDUCTION In this exercise, you will use Microsoft Excel to generate several synthetic data sets based on a simplified model of daily high temperatures in

More information

Correlation. January 12, 2019

Correlation. January 12, 2019 Correlation January 12, 2019 Contents Correlations The Scattterplot The Pearson correlation The computational raw-score formula Survey data Fun facts about r Sensitivity to outliers Spearman rank-order

More information

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools Abstract In this project, we study 374 public high schools in New York City. The project seeks to use regression techniques

More information

Historical Data RSM Tutorial Part 1 The Basics

Historical Data RSM Tutorial Part 1 The Basics DX10-05-3-HistRSM Rev. 1/27/16 Historical Data RSM Tutorial Part 1 The Basics Introduction In this tutorial you will see how the regression tool in Design-Expert software, intended for response surface

More information

Accelerated Algebra I Final Review Linear and Exponential Functions 1. If f (x) = 3x 5 and the domain of f is {2, 4, 6}, what is the range of f (x)?

Accelerated Algebra I Final Review Linear and Exponential Functions 1. If f (x) = 3x 5 and the domain of f is {2, 4, 6}, what is the range of f (x)? Accelerated Algebra I Final Review Linear and Exponential Functions 1. If f (x) = 3x 5 and the domain of f is {2, 4, 6}, what is the range of f (x)? 2. Given the graph of f (x) below, what is f (2)? 3.

More information

Lab 10 Regression IV

Lab 10 Regression IV ggplot2 package: Lab 10 Regression IV Dave presented analysis of a data set on body fat which I would like to use to show features I think are worth knowing about in ggplot2 (and associated) packages.

More information

Linear Regression. A few Problems. Thursday, March 7, 13

Linear Regression. A few Problems. Thursday, March 7, 13 Linear Regression A few Problems Plotting two datasets Often, one wishes to overlay two line plots over each other The range of the x and y variables might not be the same > x1 = seq(-3,2,.1) > x2 = seq(-1,3,.1)

More information

Model Selection and Inference

Model Selection and Inference Model Selection and Inference Merlise Clyde January 29, 2017 Last Class Model for brain weight as a function of body weight In the model with both response and predictor log transformed, are dinosaurs

More information

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9 Contents 1 Introduction to Using Excel Spreadsheets 2 1.1 A Serious Note About Data Security.................................... 2 1.2

More information

Chemistry Excel. Microsoft 2007

Chemistry Excel. Microsoft 2007 Chemistry Excel Microsoft 2007 This workshop is designed to show you several functionalities of Microsoft Excel 2007 and particularly how it applies to your chemistry course. In this workshop, you will

More information

Lesson 76. Linear Regression, Scatterplots. Review: Shormann Algebra 2, Lessons 12, 24; Shormann Algebra 1, Lesson 94

Lesson 76. Linear Regression, Scatterplots. Review: Shormann Algebra 2, Lessons 12, 24; Shormann Algebra 1, Lesson 94 Lesson 76 Linear Regression, Scatterplots Review: Shormann Algebra 2, Lessons 12, 24; Shormann Algebra 1, Lesson 94 Tools required: A graphing calculator or some sort of spreadsheet program, like Excel

More information

Workshop 8: Model selection

Workshop 8: Model selection Workshop 8: Model selection Selecting among candidate models requires a criterion for evaluating and comparing models, and a strategy for searching the possibilities. In this workshop we will explore some

More information

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value.

( ) = Y ˆ. Calibration Definition A model is calibrated if its predictions are right on average: ave(response Predicted value) = Predicted value. Calibration OVERVIEW... 2 INTRODUCTION... 2 CALIBRATION... 3 ANOTHER REASON FOR CALIBRATION... 4 CHECKING THE CALIBRATION OF A REGRESSION... 5 CALIBRATION IN SIMPLE REGRESSION (DISPLAY.JMP)... 5 TESTING

More information

Lab #9: ANOVA and TUKEY tests

Lab #9: ANOVA and TUKEY tests Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for

More information

MITOCW watch?v=r6-lqbquci0

MITOCW watch?v=r6-lqbquci0 MITOCW watch?v=r6-lqbquci0 The following content is provided under a Creative Commons license. Your support will help MIT OpenCourseWare continue to offer high quality educational resources for free. To

More information

Experiment 1 CH Fall 2004 INTRODUCTION TO SPREADSHEETS

Experiment 1 CH Fall 2004 INTRODUCTION TO SPREADSHEETS Experiment 1 CH 222 - Fall 2004 INTRODUCTION TO SPREADSHEETS Introduction Spreadsheets are valuable tools utilized in a variety of fields. They can be used for tasks as simple as adding or subtracting

More information

A (very) brief introduction to R

A (very) brief introduction to R A (very) brief introduction to R You typically start R at the command line prompt in a command line interface (CLI) mode. It is not a graphical user interface (GUI) although there are some efforts to produce

More information

Error Analysis, Statistics and Graphing

Error Analysis, Statistics and Graphing Error Analysis, Statistics and Graphing This semester, most of labs we require us to calculate a numerical answer based on the data we obtain. A hard question to answer in most cases is how good is your

More information

18.02 Multivariable Calculus Fall 2007

18.02 Multivariable Calculus Fall 2007 MIT OpenCourseWare http://ocw.mit.edu 18.02 Multivariable Calculus Fall 2007 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 18.02 Problem Set 4 Due Thursday

More information

Chapter 2 Assignment (due Thursday, April 19)

Chapter 2 Assignment (due Thursday, April 19) (due Thursday, April 19) Introduction: The purpose of this assignment is to analyze data sets by creating histograms and scatterplots. You will use the STATDISK program for both. Therefore, you should

More information

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10 St@tmaster 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE Module 9: R 9.1 Random coefficients models...................... 1 9.1.1 Constructed data........................

More information

Exam January? 9:30 11:30

Exam January? 9:30 11:30 UNIT STATISTICS Date Lesson TOPIC Homework Dec. 9 Dec. Dec. Jan. 9 Jan. 0 Jan....... Representing Data WS. Histograms WS. Measures of Central Tendenc Find the mean, median, and mode of the data sets on

More information

Chapter 3 Analyzing Normal Quantitative Data

Chapter 3 Analyzing Normal Quantitative Data Chapter 3 Analyzing Normal Quantitative Data Introduction: In chapters 1 and 2, we focused on analyzing categorical data and exploring relationships between categorical data sets. We will now be doing

More information

Chapter One: Getting Started With IBM SPSS for Windows

Chapter One: Getting Started With IBM SPSS for Windows Chapter One: Getting Started With IBM SPSS for Windows Using Windows The Windows start-up screen should look something like Figure 1-1. Several standard desktop icons will always appear on start up. Note

More information

STAT 705 Introduction to generalized additive models

STAT 705 Introduction to generalized additive models STAT 705 Introduction to generalized additive models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Generalized additive models Consider a linear

More information

VIDYA SAGAR COMPUTER ACADEMY. Excel Basics

VIDYA SAGAR COMPUTER ACADEMY. Excel Basics Excel Basics You will need Microsoft Word, Excel and an active web connection to use the Introductory Economics Labs materials. The Excel workbooks are saved as.xls files so they are fully compatible with

More information

Chapter 1 Introduction to MATLAB

Chapter 1 Introduction to MATLAB Chapter 1 Introduction to MATLAB 1.1 What is MATLAB? MATLAB = MATrix LABoratory, the language of technical computing, modeling and simulation, data analysis and processing, visualization and graphics,

More information

A Knitr Demo. Charles J. Geyer. February 8, 2017

A Knitr Demo. Charles J. Geyer. February 8, 2017 A Knitr Demo Charles J. Geyer February 8, 2017 1 Licence This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License http://creativecommons.org/licenses/by-sa/4.0/.

More information

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software. Welcome to Basic Excel, presented by STEM Gateway as part of the Essential Academic Skills Enhancement, or EASE, workshop series. Before we begin, I want to make sure we are clear that this is by no means

More information

GRAPHING BAYOUSIDE CLASSROOM DATA

GRAPHING BAYOUSIDE CLASSROOM DATA LUMCON S BAYOUSIDE CLASSROOM GRAPHING BAYOUSIDE CLASSROOM DATA Focus/Overview This activity allows students to answer questions about their environment using data collected during water sampling. Learning

More information

Introduction to the workbook and spreadsheet

Introduction to the workbook and spreadsheet Excel Tutorial To make the most of this tutorial I suggest you follow through it while sitting in front of a computer with Microsoft Excel running. This will allow you to try things out as you follow along.

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

0 Graphical Analysis Use of Excel

0 Graphical Analysis Use of Excel Lab 0 Graphical Analysis Use of Excel What You Need To Know: This lab is to familiarize you with the graphing ability of excels. You will be plotting data set, curve fitting and using error bars on the

More information