SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

Similar documents
SPSS. (Statistical Packages for the Social Sciences)

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

WELCOME! Lecture 3 Thommy Perlinger

Brief Guide on Using SPSS 10.0

Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2018/10/30) The numbers of figures in the SPSS_screenshot.pptx are shown in red.

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

An introduction to SPSS

- 1 - Fig. A5.1 Missing value analysis dialog box

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

8. MINITAB COMMANDS WEEK-BY-WEEK

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018

Data Management - 50%

Excel 2010 with XLSTAT

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

Product Catalog. AcaStat. Software

IBMSPSSSTATL1P: IBM SPSS Statistics Level 1

Right-click on whatever it is you are trying to change Get help about the screen you are on Help Help Get help interpreting a table

SPSS for Survey Analysis

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975.

Basic Medical Statistics Course

Teaching students quantitative methods using resources from the British Birth Cohorts

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Correctly Compute Complex Samples Statistics

Ronald H. Heck 1 EDEP 606 (F2015): Multivariate Methods rev. November 16, 2015 The University of Hawai i at Mānoa

MINITAB Release Comparison Chart Release 14, Release 13, and Student Versions

Year 10 General Mathematics Unit 2

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

LAB 1 INSTRUCTIONS DESCRIBING AND DISPLAYING DATA

BUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office)

MINITAB 17 BASICS REFERENCE GUIDE

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D.

Technical Support Minitab Version Student Free technical support for eligible products

Using SPSS with The Fundamentals of Political Science Research

Example Using Missing Data 1

Using Excel for Graphical Analysis of Data

Statistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings

Data analysis using Microsoft Excel

User Services Spring 2008 OBJECTIVES Introduction Getting Help Instructors

Copyright 2015 by Sean Connolly

Creating a data file and entering data

Graphical Analysis of Data using Microsoft Excel [2016 Version]

STATA 13 INTRODUCTION

Forfattere Intro to SPSS 19.0 Description

Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V)

AcaStat User Manual. Version 8.3 for Mac and Windows. Copyright 2014, AcaStat Software. All rights Reserved.

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

Preparing for Data Analysis

Basic concepts and terms

INTRODUCTION TO SPSS OUTLINE 6/17/2013. Assoc. Prof. Dr. Md. Mujibur Rahman Room No. BN Phone:

STA 570 Spring Lecture 5 Tuesday, Feb 1

Lab #9: ANOVA and TUKEY tests

Homework 1 Excel Basics

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

Fathom Dynamic Data TM Version 2 Specifications

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

Excel Tips and FAQs - MS 2010

Multivariate Normal Random Numbers

Preparing for Data Analysis

Independent Variables

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

Meet MINITAB. Student Release 14. for Windows

Predict Outcomes and Reveal Relationships in Categorical Data

Generalized least squares (GLS) estimates of the level-2 coefficients,

Minitab 18 Feature List

DSC 201: Data Analysis & Visualization

IAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram

DoE with Visual-XSel 13.0

Organizing Your Data. Jenny Holcombe, PhD UT College of Medicine Nuts & Bolts Conference August 16, 3013

Table of Contents (As covered from textbook)

SOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.

THE BASICS OF USING SPSS OCTOBER 22, 2008

Regression on SAT Scores of 374 High Schools and K-means on Clustering Schools

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Intermediate SPSS. If you have an SPSS dataset (*.sav), you can open it in the following way:

Data Management Project Using Software to Carry Out Data Analysis Tasks

Dr Wan Nor Arifin Unit of Biostatistics and Research Methodology, Universiti Sains Malaysia.

Basic Medical Statistics Course

Using Excel for Graphical Analysis of Data

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

JMP Book Descriptions

Bivariate (Simple) Regression Analysis

SPSS: AN OVERVIEW. V.K. Bhatia Indian Agricultural Statistics Research Institute, New Delhi

WORKSHOP: Using the Health Survey for England, 2014

How to use FSBforecast Excel add in for regression analysis

SPSS INSTRUCTION CHAPTER 9

Math 227 EXCEL / MEGASTAT Guide

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

Learn What s New. Statistical Software

CLAREMONT MCKENNA COLLEGE. Fletcher Jones Student Peer to Peer Technology Training Program. Basic Statistics using Stata

Set up of the data is similar to the Randomized Block Design situation. A. Chang 1. 1) Setting up the data sheet

In Minitab interface has two windows named Session window and Worksheet window.

Here is Kellogg s custom menu for their core statistics class, which can be loaded by typing the do statement shown in the command window at the very

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.

Scatterplot: The Bridge from Correlation to Regression

Transcription:

SPSS QM II SHORT INSTRUCTIONS This presentation contains only relatively short instructions on how to perform some statistical analyses in SPSS. Details around a certain function/analysis method not covered by these instructions are often more or less intuitive and self-explanatory. There is also a Help button in every dialog window that you can use to get more information. BE CAREFUL Statistical software has very limited possibilities to critically review the information that is being entered, and the results being processed. It is therefore of utter importance to keep track of which assumptions need to be fulfilled in every situation, and how the results should be interpreted. Please send an e-mail to inger.persson@statistik.uu.se if you discover anything that is incorrect in this document. Updated 2013-05-07

Contents 1 Tip sheets, online introductions, and other manuals... 4 2 Installing SPSS on your own computer... 4 3 Missing value analysis... 5 3.1 Variable overview... 5 3.2 Pattern and extent of missing data among cases... 6 3.3 Excluding one or more variables... 6 3.4 Diagnose the randomness (MAR or MCAR) of the missing data process... 7 3.4.1 T-tests for differences between groups with valid vs. missing data... 7 3.4.2 Overall test of randomness (Little s MCAR test)... 8 4 Imputation of missing data... 9 4.1 Complete case approach... 9 4.2 Use all available data... 9 4.3 Mean substitution... 9 4.4 Mean or median of nearby points... 10 4.5 Regression imputation... 10 5 Outlier detection... 10 5.1 Univariate outlier detection... 10 5.2 Bivariate outlier detection... 11 5.3 Multivariate outlier detection... 11 6 Creating dummy variables from categorical variables... 12 7 Creating binary variables from numerical variables... 17 8 Creating summated scales... 18 9 Transformations of variables... 19 10 Linear regression models... 20 10.1 Stepwise estimation, forward addition, backward elimination... 20 10.2 Levene s test for equality of variances (homoscedastiticy test)... 20 10.3 Residual plots... 20 10.4 Partial regression plots... 20 10.5 Confidence interval for the regression coefficient... 20 10.6 Identifying outliers among the residuals (influential observations)... 20 10.7 Confidence intervals for predicted mean values... 20 10.8 Confidence intervals around forecasts... 20

10.9 Assessing multicollinearity... 21 10.10 Validation of regression results by using additional (or split) samples... 21 10.10.1 Calculate predicted/forecasted values of Y in the new data set... 21 11 Logistic regression models... 21 11.1 Hosmer and Lemeshow measure of overall fit... 21 11.2 Casewise diagnostics... 21 12 ANOVA... 22 12.1 One-way ANOVA... 22 12.1.1 Normality tests and plots for one-way ANOVA... 22 12.2 Two-way ANOVA... 22 12.2.1 Normality tests and plots for two-way ANOVA... 22 12.2.2 Plots of estimated marginal means... 22 12.3 Descriptive statistics (including standard deviations for each group)... 22

1 Tip sheets, online introductions, and other manuals There is a manual for basic statistics used in the first Quantitative Methods course which might be helpful (SPSS manual QM 2012.pdf). Excellent tip sheets for some SPSS procedures have been produced by University of Reading. Some of them are being used in this manual, and whenever they are there is a link provided to the appropriate tip sheet. Data for tip sheet examples can be found at the bottom of this page: http://www.reading.ac.uk/maths-and-stats/about/maths-tipsheets.aspx Other, more extensive manuals can be found here: IBM SPSS Statistics 20 Brief Guide (170 pages) describes how to; open and import data files, edit data, produce summary statistics and some graphs, etc. ftp://public.dhe.ibm.com/software/analytics/spss/documentation/statistics/20.0/en/client/manu als/ibm_spss_statistics_brief_guide.pdf IBM SPSS Statistics 20 Core System Users Guide (446 pages) describes how to; open, import, and export data files, edit and transform data, create pivot tables, etc. ftp://public.dhe.ibm.com/software/analytics/spss/documentation/statistics/20.0/en/client/manu als/ibm_spss_statistics_core_system_users_guide.pdf IBM SPSS Statistics Base 20 (328 pages) describes how to; produce descriptive statistics, crosstabs, explore data (including Normality plots), perform t-tests, calculate correlations, linear regression, and much more. ftp://public.dhe.ibm.com/software/analytics/spss/documentation/statistics/20.0/en/client/manu als/ibm_spss_statistics_base.pdf You can also find introductions to SPSS online, eg. this one (at YouTube): http://www.youtube.com/watch?v=ethvlezs7qq (approx. 10 minutes) 2 Installing SPSS on your own computer If you wish to install SPSS on your own computer you can download a free 14-day Trial version here: http://www14.software.ibm.com/download/data/web/en_us/trialprograms/w110742e06714b 29.html There are also student licenses available, 6 or 12 months. SPSS Statistics Premium GradPack is needed for the methods described in this manual.

3 Missing value analysis 3.1 Variable overview Choose Analyze >> Missing Value Analysis from the Menu tab. The following dialog window will appear. Add the variable(s) for which you want to perform a missing value analysis to the Quantitative Variables and/or Categorical Variables field. or Click Use All Variables to get a summary of all variables in the data set. Then click OK to produce the overview

3.2 Pattern and extent of missing data among cases To investigate the extent and pattern of missing data, choose Analyze >> Missing Value Analysis from the Menu tab. Select variables as described in section 2.1 above. Then click Patterns to display the pattern of cases with missing data, as described below. Click Patterns, and mark Cases with missing values... in the dialog window that will appear. 3.3 Excluding one or more variables To see what will happen if one or more variables is excluded, choose Analyze >> Missing Value Analysis from the Menu tab. Select variables as described in section 2.1 above. Then click Patterns, as described in section 2.2 above. The following dialog window will appear.

Mark Tabulated cases,... to get a summary of the missing data pattern when one or more variables are excluded. 3.4 Diagnose the randomness (MAR or MCAR) of the missing data process There are two diagnostics tests that can be used to assess the level of randomness (MAR or MCAR), described in sections 2.4.1 and 2.4.2 below. As a result of these tests, the missing data process is classified as either MAR or MCAR. 3.4.1 T-tests for differences between groups with valid vs. missing data Two groups of individuals are formed: one with missing values of Y, and another with valid values of Y. Then statistical tests (e.g. t-tests) are performed to see if differences exist between the two groups based on other variables of interest. Significant differences indicate the possibility of nonrandom missing data.

To perform the t-tests, choose Analyze >> Missing Value Analysis from the Menu tab. Click Descriptives. The following dialog window will appear. Mark t tests... to see if differences exist between two groups based on all other numerical variables of interest. If a nonrandom pattern is obvious, the missing data process is concluded to be MAR. 3.4.2 Overall test of randomness (Little s MCAR test) An overall test of randomness compares patterns of missing data on all variables with the pattern expected for random missing data. If no significant differences are found, the missing data can be classified as MCAR. If significant differences are found, the nonrandom missing data processes have to be investigated. To perform an overall test of randomness, choose Analyze >> Missing Value Analysis from the Menu tab. The dialog window below will appear. Under Estimation, mark EM for Little s MCAR test with alternative hypothesis: The observed pattern of missing data differs from a random pattern. If the MCAR test is significant, the missing data process is concluded to be MAR.

Mark EM, for Little s MCAR test 4 Imputation of missing data 4.1 Complete case approach To include only those observations with complete data, use Listwise deletion for each statistical method to be performed. E.g. for linear regression, choose Analyze >> Regression >> Linear from the Menu tab. Click Options, and mark Exclude cases listwise. 4.2 Use all available data To use all available data (with different numbers of observations used in different analyses), use Pairwise deletion for each statistical method to be performed. E.g. for linear regression, choose Analyze >> Regression >> Linear from the Menu tab. Click Options, and mark Exclude cases pairwise. 4.3 Mean substitution To substitute missing values for a variable with the mean value of that variable, choose Transform >> Replace Missing Values from the Menu tab. Choose which variable(s) to impute missing values for, and choose Method Series mean.

4.4 Mean or median of nearby points To substitute missing values for a variable with the mean value of that variable calculated from the valid surrounding values, choose Transform >> Replace Missing Values from the Menu tab. Choose which variable(s) to impute missing values for, choose Method Mean of nearby points, and choose a number for Span of nearby points. To substitute missing values for a variable with the median value of that variable calculated from the valid surrounding values, do as above but choose Method Median of nearby points. 4.5 Regression imputation To substitute missing values for a variable with values predicted by regression analysis, choose Transform >> Replace Missing Values from the Menu tab. Choose which variable(s) to impute missing values for, and choose Method Linear trend at point. 5 Outlier detection 5.1 Univariate outlier detection Choose Analyze >> Descriptive Statistics >> Descriptive from the Menu tab, and mark Save standardized values as variables. Then look among your variables in Data view (not in the output window that shows a table with descriptive frequencies by default). New, standardized, variables have been created. Sort the data by one Z-variable at a time to find potential outliers. Choose Data >> Sort Cases from the Menu tab, and choose the Z-variable you want to examine. Check the observations at the top and bottom of the sorted data set for each Z-variable, to find standardized values exceeding ±2.5 for small samples (increase threshold value up to ±4 for large samples). Note which individuals that have potential outliers, and for which variables. You can also, if you wish, create an indicator variable as follows (for easier identification): Transform >> Compute Variable Name the Target variable e.g. ZAge_4 Numeric expression: Type 1 Click If Mark Include if case satisfies condition and type ZAge >4 or ZAge < -4 This will provide a new variable, with the value 1 if the standardized value exceeds ±4.

5.2 Bivariate outlier detection Start by creating scatter plots for all pairs of variables. Choose Graphs >> Legacy Dialogs >> Scatter/Dot from the Menu tab, and choose Matrix scatter. If you find an outlier in a scatterplot, it is easy to find out which individual that observation belongs to. Double click on the scatterplot to open the Chart Editor. Choose Elements >> Data Label Mode from the Menu tab in the Chart Editor. Then right click on the observation, and choose Go to case. 5.3 Multivariate outlier detection To calculate Mahalanobis D 2 measure you first need to decide which variables you want to calculate the multidimensional distance for. You need to decide both the dependent, and all independent variables to include in the multidimensional relationship. Choose Analyze >> Regression >> Linear from the Menu tab. Click Save, mark Mahalanobis. This will create a new variable, named MAH_1. Then calculate D 2 /df, where df=number of independent variables. Choose Transform >> Compute Variable from the Menu tab. Name the target variable, e.g. MAH_df, and write the numeric expression, e.g. MAH_1/8 if you have 8 independent variables. You can again, if you wish, create an indicator variable (for easier identification) as described for univariate outlier detection in section 4.1 above.

6 Creating dummy variables from categorical variables Categorical variables have to be recoded as dummy variables in order to include them as explanatory variables in many multivariate techniques. Remember that ordered categorical variables can look like numerical variables in SPSS, if the categories are denoted by numbers! The following example has been used in the instructions below. SocioEcStatus= Socioeconomic status SchoolProgram = High school program. Vocation=vocational school, a work preparing program Reading score = reading test score To be able to recode the variables you need to know which variable values that have been assigned to each variable.

First, go to Variable view to see which values have been assigned to the variables. Find the variable and click in the Values cell.

Click here to get a list of values for this variable Gender is already a dummy variable, where 1=female and 0=male Socioeconomic status is a categorical variable

High school program is a categorical variable Then create the dummies needed by choosing Transform >> Recode into Different Variables from the Menu bar. The following dialog window will appear. 1 2 3 4 1) Choose which input variable you want to use, to create the new variable from (Candy is being used in this example) 2) Type a name of the variable you are about to create 3) Click Change 4) Click Old and New Values

A new dialog window will appear, see below. 1a) To create a dummy for the category denoted by the value 2 (socioeconomic status = middle), select Value and type the value you want to recode. 1 2 3 3) Click Add 2) Type 1 for the category you want the dummy to denote. Repeat steps 1 to 3 above for all different categories, letting system- and/or user-missing values still be missing, and using the New Value 0 for all other categories than the category this dummy will represent. Click Continue when you have coded all possible categories to missing, 0, or 1, and finally click OK. If your categorical variable has more than two categories, you need to create one dummy for each of the categories except one (the reference category). IMPORTANT! Make sure to visually check that your new dummy variable(s) contains the values that you intended. To make this check easier you might want to place the new variable next to the original one. A description of how to move a column can be found in the manual from the first Quantitative Methods course (SPSS manual QM 2012.pdf), section 5.6.

7 Creating binary variables from numerical variables A binary variable denoting e.g. obesity based on Waist-hip ratio can be created according to the following: Transform >> Compute Variable Name the Target variable e.g. Obese Numeric expression: Type 1 Click If Mark Include if case satisfies condition and type (WaisteHip > 1 and Gender='Male') or (WaisteHip > 0.85 and Gender='Female') Then you get a new variable, with the value 1 if the Waist-hip ratio exceeds 1 for males and 0.85 for females. Make sure that the missing values of WaistHip (9999) are not being coded as obesity! This can e.g. be done by using the following condition: ((WaisteHip > 1 and Gender='Male') or (WaisteHip > 0.85 and Gender='Female')) and WaisteHip ~= 9999 (Make sure to get all parentheses right! ~= means not equal to.) You also need the variable to take the value 0 if the individuals are not obese. This can be done similarly: Transform >> Recode into Same variables Choose the variable you created above Click If Mark Include if case satisfies condition and type WaisteHip ~= 9999 Then Click Old and New Values Under Old value mark System- or user-missing and under New Value type 0 Check that the variable now contains the values 1 and 0, and is missing for missing values of WaistHip (9999). A binary variable denoting e.g. overweight (or obesity) based on BMI (BMI>25 or BMI>30) can be created according to the following: Transform >> Compute Variable Name the Target variable e.g. Overweight Numeric expression: Type 1 Click If Mark Include if case satisfies condition and type BMI > 25 Then you get a new variable, with the value 1 if BMI exceeds 25. You also need the variable to take the value 0 if the individuals are not obese. This can be done similarly:

Transform >> Compute Variable Name the Target variable the same as above e.g. Overweight Numeric expression: Type 0 Click If Mark Include if case satisfies condition and type BMI <= 25 and MISSING(BMI) ~= 1 (MISSING(BMI) returns a value of 1 if the value of BMI is missing. ~= means not equal to.) Check that the variable now contains the values 1 and 0, and is missing for missing values of BMI. 8 Creating summated scales To create summated scales, i.e. to combine several variables into one measure, choose Transform >> Compute Variable from the Menu tab. In the following example a variable denoting the frequency of eating sweets is created, where sweets is defined as crisps, candy, cakes, icecream, or soda. The variables to base the summary variable on are coded according to the following: 1=Never 5=1 time/week 2=< 1 time/month 6=2-3 times/week 3=1 time/month 7=4-6 times/week 4=2-3 times/month 8=Daily Let the combined variable e.g. get the value 1 if at least one of the separate sweets are being eaten up to 1 time/month (the following expression can be copied and pasted to SPSS): (Crisps=1 or Crisps=2 or Crisps=3) and (Candy=1 or Candy=2 or Candy=3) and (Cakes=1 or Cakes=2 or Cakes=3) and (IceCream=1 or IceCream=2 or IceCream=3) and (Soda=1 or Soda=2 or Soda=3) And the value 2 if at least one of the separate sweets are being eaten 2-5 times/month: (Crisps=4 or Crisps=5) or (Candy=4 or Candy=5) or (Cakes=4 or Cakes=5) or (IceCream=4 or IceCream=5) or (Soda=4 or Soda=5) And the value 3 if at least one of the separate sweets are being eaten 2-6 times/week: (Crisps=6 or Crisps=7) or (Candy=6 or Candy=7) or (Cakes=6 or Cakes=7) or (IceCream=6 or IceCream=7) or (Soda=6 or Soda=7) And the value 4 if at least one of the separate sweets are being eaten daily: Crisps=8 or Candy=8 or Cakes=8 or IceCream=8 or Soda=8

9 Transformations of variables To transform a variable, choose Transform >> Compute Variable from the Menu tab. Numeric expressions for the most common transformations are described below (the variable Age is being transformed): Inverse: Logarithm: Square root: Squares: 1/Age LG10(Age) SQRT(Age) Age*Age Other expressions for mathematical calculations can be found if you in the Compute Variable dialog window, under Function group, click Arithmetic. Then a number of Functions and special variables will show in the lower right square, see picture below. Double click on one of the functions and a numeric expression will show, with a question mark denoting where you should enter the variable you want to make the calculation for.

10 Linear regression models 10.1 Stepwise estimation, forward addition, backward elimination Choose Analyze >> Regression >> Linear from the Menu tab. Method: choose Stepwise, Forward, or Backward. Choose Options to select significance level. 10.2 Levene s test for equality of variances (homoscedastiticy test) Choose Analyze >> Descriptive Statistics >> Explore from the Menu tab. Choose Dependent variable and Factor variable (denoting the groups). Click Plots, and mark Untransformed under Levene test. 10.3 Residual plots Choose Analyze >> Regression >> Linear from the Menu tab. Click Save, mark Unstandardized Predicted Values and Standardized Residuals. Then produce a scatter plot of the Y=ZRE_1 vs X=PRE_1. Also review the distribution of the standardized residuals saved above, along with a Normality plot to check the assumption of Normality. Normality tests can also be used. (Normality plots and tests are described in SPSS manual QM 2013.pdf) 10.4 Partial regression plots Choose Analyze >> Regression >> Linear from the Menu tab. Click Plots, and mark Produce all partial plots. 10.5 Confidence interval for the regression coefficient Choose Analyze >> Descriptive Statistics >> Explore from the Menu tab. Click Statistics, and mark Confidence intervals. 10.6 Identifying outliers among the residuals (influential observations) Choose Analyze >> Descriptive Statistics >> Explore from the Menu tab. Click Save, and mark Standardized Residuals. Then choose Data >> Sort from the Menu tab, and sort by the variable containing the residuals. Residuals exceeding ±2 (more than 2 standard deviations from the mean of the residuals) are identified as possible outliers. 10.7 Confidence intervals for predicted mean values Choose Analyze >> Regression >> Linear from the Menu tab. Click Save, and mark Mean under Prediction intervals. 10.8 Confidence intervals around forecasts Choose Analyze >> Regression >> Linear from the Menu tab. Click Save, and mark Individual under Prediction intervals.

10.9 Assessing multicollinearity Choose Analyze >> Regression >> Linear from the Menu tab. Click Statistics, and mark Collinearity diagnostics. 10.10 Validation of regression results by using additional (or split) samples You can use the additional data set to validate your final regression model in two ways; 1) Use your regression equation to calculate predicted/forecasted values of Y in the new data set, as described in section 5.11 below. 2) Estimate a new regression equation based on the new data set, with the same dependent and independent variables as in your final model. Then compare the two models to see if they are approximately the same with regards to regression coefficients (and their corresponding confidence intervals), SE E, R 2, etc. 10.10.1 Calculate predicted/forecasted values of Y in the new data set One way to validate your final regression model is to use your regression equation to calculate predicted/forecasted values of Y in the new data set. E.g., for the regression equation Y = 4.2 + 0.34 Age 0.97 PAL do as follows: Transform >> Compute Variable Name the Target variable e.g. Ypred Numeric expression: Type 4.2+0.34*Age 0.97*PAL This creates a new varibale, Ypred. This variable is to be compared to the actual values of Y, e.g. by plotting the two variables against each other. Graphs >> Legacy Dialogs >> Scatter/Dot Choose Simple Scatter If the actual values and the forecasted values are similar, the scatter should follow a 45 degree line. 11 Logistic regression models To estimate a logistic regression model, choose Analyze >> Regression >> Binary Logistic from the Menu tab. 11.1 Hosmer and Lemeshow measure of overall fit Choose Analyze >> Regression >> Binary Logistic from the Menu tab. Click Options, and mark Hosmer-Lemeshow goodness-of-fit. 11.2 Casewise diagnostics Choose Analyze >> Regression >> Binary Logistic from the Menu tab. Click Options, and mark Casewise listing of rediduals.

12 ANOVA 12.1 One-way ANOVA To perform a one-way Analysis of Variance (ANOVA), choose Analyze >> General Linear Models >> Univariate from the Menu tab. Choose your independent/explanatory variable as Fixed factors. You can also choose Analyze >> Compare Means >> One-way ANOVA. Choose your variable of interest under Dependent List, and the grouping variable under Factor. This option will however not provide a measure of R 2. 12.1.1 Normality tests and plots for one-way ANOVA Choose Analyze >> Descriptive statistics >> Explore from the Menu tab. Add the variable of interest under Dependent List, and the grouping variable under Factor List. Click Plots and mark Normality plots with tests. You can also get histograms (per group) from this option, by clicking Plots and marking Histogram. Boxplots are provided by default. 12.2 Two-way ANOVA To perform a two-way Analysis of Variance (ANOVA), choose Analyze >> General Linear Models >> Univariate from the Menu tab. Choose your independent/explanatory variables as Fixed factors. 12.2.1 Normality tests and plots for two-way ANOVA Choose Data >> Split File from the Menu tab. Mark Organize output by groups and add the two factor variables. Then choose Analyze >> Descriptive statistics >> Explore from the Menu tab, and add the variable of interest under Dependent List. Leave the Fixed factor(s) field empty. Click Plots and mark Normality plots with tests. 12.2.2 Plots of estimated marginal means Choose Analyze >> General Linear Models >> Univariate from the Menu tab. Click Plots. Add one of the factors to Horizontal axis, the other to Separate Lines, and click Add. 12.3 Descriptive statistics (including standard deviations for each group) Choose Analyze >> General Linear Models >> Univariate from the Menu tab. Click Options, and mark Descriptive statistics.