Applied Regression Modeling: A Business Approach

Similar documents
Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach

Brief Guide on Using SPSS 10.0

Minitab 17 commands Prepared by Jeffrey S. Simonoff

8. MINITAB COMMANDS WEEK-BY-WEEK

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D.

Here is Kellogg s custom menu for their core statistics class, which can be loaded by typing the do statement shown in the command window at the very

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

Excel 2010 with XLSTAT

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

SPSS. (Statistical Packages for the Social Sciences)

An introduction to SPSS

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

MINITAB 17 BASICS REFERENCE GUIDE

STATA 13 INTRODUCTION

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Contents. Tutorials Section 1. About SAS Enterprise Guide ix About This Book xi Acknowledgments xiii

Math 227 EXCEL / MEGASTAT Guide

Fathom Dynamic Data TM Version 2 Specifications

Math 121 Project 4: Graphs

Meet MINITAB. Student Release 14. for Windows

book 2014/5/6 15:21 page v #3 List of figures List of tables Preface to the second edition Preface to the first edition

CHAPTER 6. The Normal Probability Distribution

Data Management - 50%

Index. Bar charts, 106 bartlett.test function, 159 Bottles dataset, 69 Box plots, 113

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Table of Contents (As covered from textbook)

STA 570 Spring Lecture 5 Tuesday, Feb 1

JMP 10 Student Edition Quick Guide

STAT STATISTICAL METHODS. Statistics: The science of using data to make decisions and draw conclusions

1 Introduction to Using Excel Spreadsheets

Fall 2016 CS130 - Regression Analysis 1 7. REGRESSION. Fall 2016

Making the Transition from R-code to Arc

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

Chapter 3: Data Description Calculate Mean, Median, Mode, Range, Variation, Standard Deviation, Quartiles, standard scores; construct Boxplots.

Getting Started with Minitab 17

In Minitab interface has two windows named Session window and Worksheet window.

1. Data Analysis Yields Numbers & Visualizations. 2. Why Visualize Data? 3. What do Visualizations do? 4. Research on Visualizations

Antrix Academy of Data Science TM

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Getting Started with Minitab 18

Project 11 Graphs (Using MS Excel Version )

Page 1. Graphical and Numerical Statistics

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

If the active datasheet is empty when the StatWizard appears, a dialog box is displayed to assist in entering data.

Select Cases. Select Cases GRAPHS. The Select Cases command excludes from further. selection criteria. Select Use filter variables

Two-Stage Least Squares

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

How to use FSBforecast Excel add in for regression analysis

CS130 Regression. Winter Winter 2014 CS130 - Regression Analysis 1

Course Outline. Writing Reports with Report Builder and SSRS Level 1 Course 55123: 2 days Instructor Led. About this course

Three-Dimensional (Surface) Plots

Scatterplot: The Bridge from Correlation to Regression

Introductory Applied Statistics: A Variable Approach TI Manual

Vocabulary. 5-number summary Rule. Area principle. Bar chart. Boxplot. Categorical data condition. Categorical variable.

INTRODUCTION TO SPSS OUTLINE 6/17/2013. Assoc. Prof. Dr. Md. Mujibur Rahman Room No. BN Phone:

MS in Applied Statistics: Study Guide for the Data Science concentration Comprehensive Examination. 1. MAT 456 Applied Regression Analysis

PubHlth 640 Intermediate Biostatistics Unit 2 - Regression and Correlation. Simple Linear Regression Software: Stata v 10.1

Chapter 2: Descriptive Statistics

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Chapter 3: Rate Laws Excel Tutorial on Fitting logarithmic data

Guide to Statistical Software

After opening Stata for the first time: set scheme s1mono, permanently

ST Lab 1 - The basics of SAS

Writing Reports with Report Designer and SSRS 2014 Level 1

Data Analysis Multiple Regression

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Release notes for StatCrunch mid-march 2015 update

Using SPSS with The Fundamentals of Political Science Research

2010 by Minitab, Inc. All rights reserved. Release Minitab, the Minitab logo, Quality Companion by Minitab and Quality Trainer by Minitab are

Statistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings

Excel Assignment 4: Correlation and Linear Regression (Office 2016 Version)

JMP Book Descriptions

CREATING THE ANALYSIS

Using Large Data Sets Workbook Version A (MEI)

This electronic supporting information S4 contains the main steps for fitting a response surface model using Minitab 17 (Minitab Inc.).

Multiple Linear Regression

Multivariate Capability Analysis

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Regression III: Advanced Methods

UNIT 1: NUMBER LINES, INTERVALS, AND SETS

2. Getting started with MLwiN

Introduction to STATA

GRAPHING BAYOUSIDE CLASSROOM DATA

Subject. Creating a diagram. Dataset. Importing the data file. Descriptive statistics with TANAGRA.

Data Analysis Guidelines

SAS Visual Analytics 8.2: Getting Started with Reports

STATGRAPHICS Sigma Express for Microsoft Excel. Getting Started

A Short Guide to Stata 10 for Windows

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.

Introduction to Statistical Analyses in SAS

Table Of Contents. Table Of Contents

Table of Contents. Help/Information Help System System Information About MegaStat... 11

Product Catalog. AcaStat. Software

Getting Started with MegaStat

SAS Visual Analytics 7.2, 7.3, and 7.4: Getting Started with Analytical Models

Transcription:

i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming interface. It does, however, also have an easy-to-use graphical user-interface in its Analyst Application. Further information is available at www.sas.com. The following instructions are based on the Analyst Application for SAS 9.1 for Windows. The book website contains supplementary material for other versions of SAS. Getting started and summarizing univariate data 1 Change SAS s default options by selecting Tools > Options > Preferences. Start the Analyst Application by selecting Solutions > Analysis > Analyst. To open a SAS data file, select File > Open. Output appears in a separate window each time you run an analysis. Select Edit > Copy to Program Editor to copy the output to a Program Editor Window. From there, output can be copied and pasted from SAS to a word processor like Microsoft Word. Graphs appear in separate windows and can also easily be copied and pasted to other applications. If you misplace any output, you can easily retrieve it by clicking on the Analyst Window and using the left-hand Outline Pane. 2 You can access help on SAS Analyst by selecting Help > Using This Window or clicking the Analyst Help tool. For example, to find out about boxplots find Box Plots (under Creating Graphs in the main pane of the Help Window). 3 To transform data or compute a new variable, first select Edit > Mode > Edit to change the dataset from browse mode to edit mode. Then select Data > Transform > Compute. Type a name (with no spaces) for the new variable in the top-left box, and type a mathematical expression for the variable in the large box just below this. Current variables in the dataset can be moved into this box, while the keypad and list of functions can be used to create the expression. Examples are log(x) for the natural logarithm of X and X 2 for X 2. Hit OK to create the new variable which will be added to the dataset (check it looks correct in the spreadsheet); it can now be used just like any other variable. If you get the error message Unable to add a new column as specified, this means there is a syntax error in your expression a common mistake is to forget the multiplication symbol ( ) between a number and a variable (e.g., 2 X represents 2X). To create indicator (dummy) variables from a qualitative variable, first select Edit > Mode > Edit to change the dataset from browse mode to edit mode. Then select Data > Transform > Recode Values. Select the qualitative variable in the Column to recode box, type a name for the first indicator variable in the New column name box, make sure New column type is Numeric,

ii and press OK. In the subsequent Recode Values dialog box, type 1 into the box next to the appropriate level, and type 0 into the boxes next to each of the other levels. Hit OK and check that the correct indicator variable has been added to your spreadsheet. Repeat for other indicator variables (if necessary). 4 Calculate descriptive statistics for quantitative variables by selecting Statistics > Descriptive > Summary Statistics. Move the variable(s) into the Analysis list. Click Statistics to select the summaries, such as the Mean, that you would like. 5 Create contingency tables or cross-tabulations for qualitative variables by selecting Statistics > Table Analysis. Move one qualitative variable into the Row list and another into the Column list. Cell percentages (within rows, columns, or the whole table) can be calculated by clicking Tables. 6 If you have quantitative variables and qualitative variables, you can calculate descriptive statistics for cases grouped in different categories by selecting Statistics > Descriptive > Summary Statistics. Move the quantitative variable(s) into the Analysis list and the qualitative variable(s) into the Class list. Click Statistics to select the summaries that you would like. 7 SAS Analyst does not appear to offer an automatic way to make a stem-and-leaf plot for a quantitative variable. To make a histogram for a quantitative variable, select Graphs > Histogram. Move the variable into the Analysis box. 8 To make a scatterplot with two quantitative variables, select Graphs > Scatter Plot > Two-Dimensional. Move the vertical axis variable into the Y Axis box and the horizontal axis variable into the X Axis box. SAS Analyst does not appear to offer an automatic way to make a scatterplot matrix. 9 You can mark or label cases in a scatterplot with different colors/symbols according to the categories in a qualitative variable by moving the variable into the Class box in the Scatterplot dialog. SAS Analyst does not appear to offer an automatic way to change the colors/symbols used or to identify individual cases in a scatterplot. 10 To make a bar chart for cases in different categories, select Graphs > Bar Chart > Vertical. For frequency bar charts of one qualitative variable, move the variable into the Chart box. For frequency bar charts of two qualitative variables, move one variable into the Chart box and the other into the Group By box. The bars can also represent various summary functions for a quantitative variable. For example, to represent Means, select Options, click the Bar Values tab, move the quantitative variable into the Analysis box, and select Average for Statistic to chart.

iii 11 To make boxplots for cases in different categories, select Graphs > Box Plot. Move the qualitative variable into the Class box. Move the quantitative variable into the Analysis box. SAS Analyst does not appear to offer an automatic way to create clustered boxplots for two qualitative variables. 12 To make a QQ-plot (also known as a normal probability plot) for a quantitative variable, select Graphs > Probability Plot. Move the variable into the Analysis box and leave the Distribution as Normal to assess normality of the variable. 13 To compute a confidence interval for a univariate population mean, select Statistics > Hypothesis Tests > One-Sample t-test for a Mean. Move the variable for which you want to calculate the confidence interval into the Variable box and click the Tests button to bring up another dialog box in which you can select Interval under Confidence intervals and specify the confidence level for the interval. OK will take you back to the previous dialog box, where you can now hit OK. 14 To do a hypothesis test for a univariate population mean, select Statistics > Hypothesis Tests > One-Sample t-test for a Mean. Move the variable for which you want to do the test into the Variable box and type the (null) hypothesized value into the Mean = box. Specify a lower tailed ( less than ), upper tailed ( greater than ), or two tailed ( not equal ) alternate hypothesis. OK will take you back to the previous dialog box, where you can now hit OK. Simple linear regression 15 To fit a simple linear regression model (i.e., find a least squares line), select Statistics > Regression > Simple. Explanatory box. Just hit OK for now the other items in the dialog box are addressed below. 16 To include a regression line or least squares line on a scatterplot, select Statistics > Regression > Simple. Explanatory box. Before hitting OK, click the Plots button, and check Plot observed vs independent under Scatterplots. Click OK to return to the main Simple Linear Regression dialog box, and then hit OK. Click on the Analyst Window, and double click on Scatter plot under Simple Linear Regression in the left-hand Outline Pane to find the resulting graph.

iv 17 To find 95% confidence intervals for the regression parameters in a simple linear regression model, select Statistics > Regression > Simple. Explanatory box. Before hitting OK, click the Statistics button, and check Confidence limits for estimates under Parameter estimates. Click OK to return to the main Simple Linear Regression dialog box, and then hit OK. The confidence intervals are displayed as the final two columns of the Parameter Estimates output. This applies more generally to multiple linear regression also. 18 To find a 95% confidence interval for the mean of Y at a particular value of X in a simple linear regression model, select Statistics > Regression > Simple. Explanatory box. Before hitting OK, click the Save Data button and add L95M and U95M to the empty box in the subsequent Simple Linear Regression: Save Data dialog box. Check the Create and save diagnostics data box, click OK to return to the main Simple Linear Regression dialog box, and then hit OK. Click on the Analyst Window, and double click on Diagnostics Table under Simple Linear Regression > Diagnostics in the left-hand Outline Pane to find the results. The confidence intervals for the mean of Y at each of the X-values in the dataset are displayed as two columns headed L95M and U95M. You can also obtain a confidence interval for the mean of Y at an X-value that is not in the dataset by doing the following. Before fitting the regression model, create a dataset containing (just) the X-value in question (with the same variable name as in the original dataset), and save this dataset. Then fit the regression model and follow the steps above, but before hitting OK, click the Prediction button, click Predict additional data under Prediction input and locate the dataset you just saved under Data set name. Then check List predictions and Add prediction limits under Prediction output. Click OK to return to the main Simple Linear Regression dialog box, and then hit OK. Click on the Analyst Window, and double click on Predictions under Simple Linear Regression in the left-hand Outline Pane to find the results. This applies more generally to multiple linear regression also. 19 To find a 95% prediction interval for an individual value of Y at a particular value of X in a simple linear regression model, select Statistics > Regression > Simple. Explanatory box. Before hitting OK, click the Save Data button and add L95 and U95 to the empty box in the subsequent Simple Linear Regression: Save Data dialog box. Check the Create and save diagnostics data box, click OK to return to the main Simple Linear Regression dialog box, and then hit OK. Click on the Analyst Window, and double click on Diagnostics Table under Simple Linear Regression > Diagnostics in the left-hand Outline Pane to find the results. The prediction intervals for an individual value of Y at each of the X-values in the dataset are displayed as two columns headed L95 and U95. This applies more generally to multiple linear regression also. SAS Analyst does not appear to offer an automatic way to create a prediction interval for an individual Y-value at an X-value that is not in the dataset.

v Multiple linear regression 20 To fit a multiple linear regression model, select the Explanatory box. 21 To include a quadratic regression line on a scatterplot, select Statistics > Regression > Simple. Explanatory box, and change Model from Linear to Quadratic. Before hitting OK, click the Plots button, and check Plot observed vs independent under Scatterplots. Click OK to return to the main Simple Linear Regression dialog box, and then hit OK. Click on the Analyst Window, and double click on Scatter plot under Simple Linear Regression in the left-hand Outline Pane to find the resulting graph. 22 SAS Analyst does not appear to offer an automatic way to create a scatterplot with separate regression lines for subsets of the sample. 23 SAS Analyst does not appear to offer an automatic way to find the F-statistic and associated p-value for a nested model F-test in multiple linear regression. It is possible to calculate these quantities by hand using SAS Analyst regression output and appropriate percentiles from a F-distribution. 24 To save studentized residuals in a multiple linear regression model, select the Explanatory box. Before hitting OK, click the Save Data button and add STUDENT to the empty box in the subsequent Linear Regression: Save Data dialog box. Check the Create and save diagnostics data box, click OK to return to the main Linear Regression dialog box, and then hit OK. Click on the Analyst Window, and double click on Diagnostics Table under Linear Regression > Diagnostics in the left-hand Outline Pane to find the results. The studentized residuals are displayed as STUDENT. (SAS can also calculate deleted studentized residuals, which it calls RSTUDENT.) 25 SAS Analyst does not appear to offer an automatic way to add a loess fitted line to a scatterplot. 26 To save leverages in a multiple linear regression model, select the Explanatory box. Before hitting OK, click the Save Data button and add H to the empty box in the subsequent Linear Regression: Save Data dialog box. Check the Create and save diagnostics data box, click OK to return to the main Linear Regression dialog box, and then hit OK. Click on the Analyst Window, and double click on Diagnostics Table under Linear Regression > Diagnostics in the left-hand Outline Pane to find the results. The leverages are displayed as H.

vi 27 To save Cook s distances in a multiple linear regression model, select the Explanatory box. Before hitting OK, click the Save Data button and add COOKD to the empty box in the subsequent Linear Regression: Save Data dialog box. Check the Create and save diagnostics data box, click OK to return to the main Linear Regression dialog box, and then hit OK. Click on the Analyst Window, and double click on Diagnostics Table under Linear Regression > Diagnostics in the left-hand Outline Pane to find the results. Cook s distances are displayed as COOKD. 28 To create some residual plots automatically in a multiple linear regression model, select the Explanatory box. Before hitting OK, click the Plots button, and click the Residual tab in the subsequent Linear Regression: Plots dialog box. Check Plot residuals vs variables under Residual plots, and select Standardized for Residuals and Predicted Y for Variables to create a scatterplot of the studentized residuals on the vertical axis versus the standardized predicted values on the horizontal axis. You could also check Independents for Variables to create residual plots with each predictor variable on the horizontal axis. Click OK to return to the main Linear Regression dialog box, and then hit OK. Click on the Analyst Window, and double click on the resulting graphs under Linear Regression > Residual Plots in the left-hand Outline Pane. To create residual plots manually, first create studentized residuals (see computer help #24), and then construct scatterplots with these studentized residuals on the vertical axis. 29 To create a correlation matrix of quantitative variables (useful for checking potential multicollinearity problems), select Statistics > Descriptive > Correlations. Move the variables into the Correlate box and hit OK. 30 To find variance inflation factors in multiple linear regression, select the Explanatory box. Before hitting OK, click the Statistics button, and the Tests tab in the resulting Linear Regression: Statistics dialog box. Check Variance inflation factors under Collinearity, click OK to return to the main Linear Regression dialog box, and then hit OK. The variance inflation factors are in the last column of the Parameter Estimates output under Variance Inflation. 31 To draw a predictor effect plot for graphically displaying the effects of quantitative predictors in multiple linear regression, first create a variable representing the effect, say, X1effect (see computer help #3) this variable must just involve X1 (e.g., 1+3X1+4X1 2 ). Then select Graphs > Scatter Plot > Two-Dimensional. Move the X1effect variable into the Y Axis box and X1 into the X Axis box. Before hitting OK, click on Display and select Connect points with straight lines in the resulting 2-D Scatter Plot: Display dialog box. Click OK to return to the main 2-D Scatter Plot dialog box, and then hit OK.

SAS Analyst does not appear to offer an automatic way to create more complex predictor effect plots (say, with separate lines representing different subsets of the sample). vii