An introduction to SPSS

Similar documents
Set up of the data is similar to the Randomized Block Design situation. A. Chang 1. 1) Setting up the data sheet

ANSWERS -- Prep for Psyc350 Laboratory Final Statistics Part Prep a

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Applied Regression Modeling: A Business Approach

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Applied Regression Modeling: A Business Approach

SPSS. (Statistical Packages for the Social Sciences)

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

User Services Spring 2008 OBJECTIVES Introduction Getting Help Instructors

Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2018/10/30) The numbers of figures in the SPSS_screenshot.pptx are shown in red.

Brief Guide on Using SPSS 10.0

JMP 10 Student Edition Quick Guide

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

8. MINITAB COMMANDS WEEK-BY-WEEK

Opening a Data File in SPSS. Defining Variables in SPSS

Fathom Dynamic Data TM Version 2 Specifications

SPSS for Survey Analysis

SPSS INSTRUCTION CHAPTER 9

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975.

- 1 - Fig. A5.1 Missing value analysis dialog box

Using SPSS with The Fundamentals of Political Science Research

JMP Book Descriptions

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

Right-click on whatever it is you are trying to change Get help about the screen you are on Help Help Get help interpreting a table

2016 SPSS Workshop UBC Research Commons

EDPSY 603 Statistical Design and Analysis Repeated Measures Designs

Example Using Missing Data 1

STATA 13 INTRODUCTION

SPSS: AN OVERVIEW. V.K. Bhatia Indian Agricultural Statistics Research Institute, New Delhi

TABEL DISTRIBUSI DAN HUBUNGAN LENGKUNG RAHANG DAN INDEKS FASIAL N MIN MAX MEAN SD

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

Modelling Proportions and Count Data

STATISTICS (STAT) Statistics (STAT) 1

Modelling Proportions and Count Data

Chapter 8: Regression. Self-test answers

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

Creating a data file and entering data

Product Catalog. AcaStat. Software

Generalized least squares (GLS) estimates of the level-2 coefficients,

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018

Correctly Compute Complex Samples Statistics

STAT:5201 Applied Statistic II

Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V)

Hierarchical Generalized Linear Models

INTRODUCTION TO SPSS OUTLINE 6/17/2013. Assoc. Prof. Dr. Md. Mujibur Rahman Room No. BN Phone:

Show how the LG-Syntax can be generated from a GUI model. Modify the LG-Equations to specify a different LC regression model

An Introductory Guide to Stata

Guide to Statistical Software

Excel 2010 with XLSTAT

CHAPTER 2. GENERAL PROGRAM STRUCTURE

Bivariate (Simple) Regression Analysis

Table of Contents (As covered from textbook)

StatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.

Regression. Page 1. Notes. Output Created Comments Data. 26-Mar :31:18. Input. C:\Documents and Settings\BuroK\Desktop\Data Sets\Prestige.

Introduction to Statistical Analyses in SAS

Subject. Creating a diagram. Dataset. Importing the data file. Descriptive statistics with TANAGRA.

Correctly Compute Complex Samples Statistics

Tutorial #1: Using Latent GOLD choice to Estimate Discrete Choice Models

JMP Chong Ho

CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT

Intermediate SAS: Statistics

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

Getting Started with JMP at ISU

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D.

7.4 Tutorial #4: Profiling LC Segments Using the CHAID Option

Statistical Good Practice Guidelines. 1. Introduction. Contents. SSC home Using Excel for Statistics - Tips and Warnings

Basic concepts and terms

Enterprise Miner Tutorial Notes 2 1

6:1 LAB RESULTS -WITHIN-S ANOVA

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Introduction to Mixed Models: Multivariate Regression

4. Descriptive Statistics: Measures of Variability and Central Tendency

Chapter One: Getting Started With IBM SPSS for Windows

Forfattere Intro to SPSS 19.0 Description

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

IENG484 Quality Engineering Lab 1 RESEARCH ASSISTANT SHADI BOLOUKIFAR

Modeling Categorical Outcomes via SAS GLIMMIX and STATA MEOLOGIT/MLOGIT (data, syntax, and output available for SAS and STATA electronically)

Course Code: SPSS19 Introduction to IBM SPSS Statistics

Stat 5100 Handout #14.a SAS: Logistic Regression

THE L.L. THURSTONE PSYCHOMETRIC LABORATORY UNIVERSITY OF NORTH CAROLINA. Forrest W. Young & Carla M. Bann

SPSS Modules Features

Intermediate SPSS. If you have an SPSS dataset (*.sav), you can open it in the following way:

Math 121 Project 4: Graphs

SPSS: AN OVERVIEW. SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi

Introduction to Excel Workshop

Subset Selection in Multiple Regression

1 Introduction to Using Excel Spreadsheets

Also, for all analyses, two other files are produced upon program completion.

11. Chi Square. Calculate Chi Square for contingency tables. A Chi Square is used to analyze categorical data. It compares observed

STATISTICS FOR PSYCHOLOGISTS

Math 227 EXCEL / MEGASTAT Guide

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

SAS Visual Analytics 8.2: Getting Started with Reports

Zero-Inflated Poisson Regression

CH9.Generalized Additive Model

Problem set for Week 7 Linear models: Linear regression, multiple linear regression, ANOVA, ANCOVA

GETTING STARTED. A Step-by-Step Guide to Using MarketSight

MINITAB 17 BASICS REFERENCE GUIDE

Transcription:

An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible from virtual desktop. This is probably your H : drive through the university. 1 Example SPSS Data Set from UCLA 2 2 Uploading data to SPSS 2 2.1 SPSS User Windows.................................... 2 2.1.1 Data View..................................... 2 2.1.2 Variable View.................................... 3 2.1.3 Output Viewer................................... 3 3 Simple Plots and Correlation in SPSS 3 3.1 Bar Chart.......................................... 3 3.2 Scatterplots......................................... 4 3.3 Correlation......................................... 6 4 Variable manipulation 7 4.1 Standardizing Variables.................................. 7 4.2 Transforming or Combining Variables.......................... 8 4.3 Dichotomizing a continuous variable........................... 9 5 Logistic Regression 10 6 Two-way tables (categorical variables) 11 7 Split-plot with whole plot as CRD (Type I, assumes sphericity) 12 7.1 Using General Linear Model with Repeated option................... 12 7.1.1 Format of data................................... 12 7.1.2 Modeling the data................................. 12 7.1.3 SPSS output for tests............................... 15 7.1.4 Profile plots (means)................................ 16 7.1.5 SAS output for same example........................... 18 7.1.6 Comment on Residuals............................... 19 7.2 Using Mixed Models option................................ 21 7.2.1 Format of data................................... 21 7.2.2 Modeling the data................................. 21 7.2.3 SPSS output for tests............................... 24 1

1 Example SPSS Data Set from UCLA UCLA Academic Technology Services has some nice statistical software examples available on-line, and we will utilize one of their SPSS data sets for this introduction. The data file is called binary.sav, and it is available at the URL below, and also at our class website in the datasets link. http://www.ats.ucla.edu/stat/data/binary.sav This data set contains variables related to admission to graduate school. Variables: admit -- Admission status to graduate school (0=no, 1=yes). gre -- Graduate Record Exam scores. Values range from 200 to 800. gpa -- Grade Point Average. rank -- Prestige of undergraduate school. Values are 1 through 4. Institutions with a rank of 1 have the highest prestige, while those with a rank of 4 have the lowest. 2 Uploading data to SPSS Open the binary.sav in SPSS using File Open Data... 2.1 SPSS User Windows Upon opening SPSS, you ll see the window called Data Editor. Within this window, you have two views of the data from which to choose. One looks like a spreadsheet of the actual data (Data View), and the other gives you information on the variables in the data set (Variable View). You can move back and forth between the windows by clicking on the respective tab at the bottom left of the window. After you import data or run any options, you will also see an Output Viewer window appear. 2.1.1 Data View Can be used in a manner similar to a spreadsheet. Allows user to enter Data. A new column can be entered (Highlight the column location, then Edit Insert Variable). A new row can be entered (Highlight the row location, Edit Insert Cases). You can use a formula to create a new variable, or transform a variable. 2

2.1.2 Variable View Allows user to input characteristics of variables (attributes, coding for missing values, the levels of class variables, etc.). Allows user to quickly view overall characteristics of the variables, like how many variables are in the data set, how many are categorical (nominal), or continuous (scale), etc. Many SPSS data sets arise from surveys and have 100 s of variables, which represent questions on the survey. It can take a long time to input this information, but is quite useful. 2.1.3 Output Viewer Contains the output generated by statistical procedures. Output location for graphics and plots. Can be saved separately from the.sav data file as a.spv file. Also serves as a log box. 3 Simple Plots and Correlation in SPSS NOTE: For the rest of the simple introduction (Sections 3-6), we will continue to use the binary.sav data set. ACTION REQUIRED: Change Variable Type Go to Variable View and make sure rank is set to ordinal. If not, change it to ordinal. 3.1 Bar Chart Graphs Chart Builder Bar Drag the Simple bar visual (1 st visual in top row) to the plotting area. Drag rank to the x-axis, click OK.! 3

3.2 Scatterplots Graphs Chart Builder You ll see the Chart Builder dialog box appear. Highlight Scatter/dot in the Gallery box. Drag the Grouped Scatter visual (2 nd visual in top row) to the plotting area. Drag gre to the x-axis and gpa to the y-axis. Drag rank to the set color box. Click OK (output appears in the Output Viewer box). 4

5

3.3 Correlation Analyze Correlate Bivariate You ll see the Bivariate Correlations dialog box appear. Highlight gre, then click the arrow to move it to the Variables box at the right. Highlight gpa, then click the arrow to move it to the Variables box at the right. Click OK. The output will appear in the Output Viewer box, as below. 6

4 Variable manipulation 4.1 Standardizing Variables In SPSS, you can quickly standardize a variable and include the standardized z-scores as another column in the data set. Analyze Descriptive Statistics Descriptives... Highlight gre, then click the arrow to move it to the Variables box at the right. Highlight gpa, then click the arrow to move it to the Variables box at the right. Check the box: Save standardized values as variables. Click OK (the new variables will appear in the Data View). Run a correlation analysis on these two standardized variables and compare the results to the correlation analysis using the unstandardized. 7

4.2 Transforming or Combining Variables Highlight an empty column in the Data View. Transform Compute Variable... Create a new variable called PreKnowledge which is the average of two standardized variables by inputting the formula: (Zgre+Zgpa)/2, then clicking OK. 8

4.3 Dichotomizing a continuous variable Transform Recode into Different Variables... You ll see the Recode into Different Variables dialog box appear. Highlight gre and put into the Variables box. Provide a new name called codedgre in the Name box, and press Change. Click on Old and New Values. Under Old Value, select Range, LOWEST through value: 600 Under New Value, select Value: 0 Click Add Under Old Value, select Range, value through HIGHEST: 601 Under New Value, select Value: 1 Click Add, click Continue, click Change, click OK. The new variable will now appear in the Data View window. 9

5 Logistic Regression Analyze Generalized Linear Models Generalized Linear Models... On the Type of Model tab: choose binary logistic. On the Response tab: enter admit as dependent variable. Also... click Reference Category... and select First (lower value) in order to model the 1 s not 0 s. On the Predictors tab: enter rank as a Factor, enter gre, gpa as Covariates. On the Model tab: Include desired terms in the model. (For this example, highlight all and click main effects for simplicity. If you want to enter an interaction, highlight two variables at once, then choose Interaction.) Click OK. Generalized Linear Models Dependent Variable Probability Distribution Link Function admit a Binomial Logit Model Information a. The procedure models 1 as the response, treating 0 as the reference category. Case Processing Summary N Percent Included 400 100.0% Excluded 0 0.0% Total 400 100.0% Omnibus Test a Likelihood Ratio Chi-Square df Sig. 41.459 5.000 Dependent Variable: admit Model: (Intercept), rank, gre, gpa a. Compares the fitted model against the intercept-only model. Tests of Model Effects Type III Source Wald Chi- Square df Sig. (Intercept) 19.234 1.000 rank 20.895 3.000 gre 4.284 1.038 gpa 5.872 1.015 Dependent Variable: admit Model: (Intercept), rank, gre, gpa! 10

Parameter Estimates 95% Wald Confidence Interval Hypothesis Test Parameter B Std. Error Lower Upper Wald Chi- Square df Sig. (Intercept) -5.541 1.1381-7.772-3.311 23.709 1.000 [rank=1] 1.551.4178.733 2.370 13.787 1.000 [rank=2].876.3667.157 1.595 5.706 1.017 [rank=3].211.3929 -.559.981.289 1.591 [rank=4] 0 a...... gre.002.0011.000.004 4.284 1.038 gpa.804.3318.154 1.454 5.872 1.015 (Scale) 1 b Dependent Variable: admit Model: (Intercept), rank, gre, gpa a. Set to zero because this parameter is redundant. b. Fixed at the displayed value. - Output shows rank group 4 is the baseline (or reference) group. 6 Two-way tables (categorical variables)! Analyze Descriptive Statistics Crosstabs... Highlight admit and put into the Rows box. Highlight codedgre and put into the Columns box. Click on Cells... and under Percentages, choose Column Percentages, then Continue. Click on Statistics... and choose Chi-Squared, then Continue, then OK. 11

7 Split-plot with whole plot as CRD (Type I, assumes sphericity) This statistical model has a between-subject and within-subject factor (or factors). The example we use here was seen in STAT:5201. There were two factors (DayLength and Climate), each with two levels, that formed the four treatments in the between-subject effects. There was also one withinsubject factor (Tissue) with two levels. There were two hamsters in each of the four between-subject treatment groups. Thus, each hamster was nested in a particular DayLength/Climate combination and provided two observations in the analysis (one under each Tissue level). 7.1 Using General Linear Model with Repeated option 7.1.1 Format of data One way to perform this analysis in SPSS is to approach it as a multivariate response. In that case, we need to format the data so that each row is associated with one hamster. Open the split plot hamsters.sav in SPSS using File Open Data... 7.1.2 Modeling the data Choose Analyze General Linear Models Repeated Measures... 12

Then give your within-subject factor a name (such as Tissue here) and state how many levels it has, then press Add and Define. Next, input the particular column names that coincide with the within-subject factor levels in the upper box, and define your model by inputting the between-subject factors in the appropriate box below (the default model includes all interactions between these factors). 13

Next, use the Options button to open the window below. This window will show you all the different interactions that will be tested as part of your analysis. If you dont want all of these results, you can select just specific main effects and interactions by using the Model button and the custom option in the main dialog window. Click Continue. Click the Save button and request to save the residuals and predicted values. This will save new columns in your present data set. Click Continue and then OK. 14

7.1.3 SPSS output for tests First, you ll get the within-subject tests. Tests of Within-Subjects Effects Measure: MEASURE_1 Source Tissue Tissue * DayLength Tissue * Climate Tissue * DayLength * Climate Error(Tissue) Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Sphericity Assumed Greenhouse-Geisser Huynh-Feldt Lower-bound Type III Sum of Squares df Mean Square F 385.730 1 385.730 133.155 385.730 1.000 385.730 133.155 385.730 1.000 385.730 133.155 385.730 1.000 385.730 133.155 27.563 1 27.563 9.515 27.563 1.000 27.563 9.515 27.563 1.000 27.563 9.515 27.563 1.000 27.563 9.515 6.325 1 6.325 2.183 6.325 1.000 6.325 2.183 6.325 1.000 6.325 2.183 6.325 1.000 6.325 2.183 2.418 1 2.418.835 2.418 1.000 2.418.835 2.418 1.000 2.418.835 2.418 1.000 2.418.835 11.587 4 2.897 11.587 4.000 2.897 11.587 4.000 2.897 11.587 4.000 2.897 And then the between-subject tests farther down the in the output. Measure: MEASURE_1 Transformed Variable: Source Intercept DayLength Climate DayLength * Climate Error Tests of Between-Subjects Effects Average Type III Sum of Squares df Mean Square F Sig. 626.751 1 626.751 96.498.001 42.445 1 42.445 6.535.063 18.233 1 18.233 2.807.169 1.538 1 1.538.237.652 25.980 4 6.495 Page 1 15

7.1.4 Profile plots (means) You can also request some profile plots by clicking on the Plots... option and inputting horizontal axis variable (DayLength), separate lines variable(climate), and separate plots variable (Tissue). 16

2.5000000000000000 Estimated Marginal Means of MEASURE_1 at Tissue = 1 Climate cold warm Estimated Marginal Means 2.0000000000000000 1.5000000000000000 1.0000000000000000.5000000000000000 long DayLength short 17.5000000000000000 Estimated Marginal Means of MEASURE_1 at Tissue = 2 Climate cold warm Estimated Marginal Means 15.0000000000000000 12.5000000000000000 10.0000000000000000 Page 1 7.5000000000000000 long DayLength short 17

7.1.5 SAS output for same example From PROC MIXED, we see the tests for fixed effects are the same. Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F DayLength 1 4 6.54 0.0629 Climate 1 4 2.81 0.1692 DayLength*Climate 1 4 0.24 0.6520 Tissue 1 4 133.16 0.0003 DayLength*Tissue 1 4 9.51 0.0368 Climate*Tissue 1 4 2.18 0.2136 DayLen*Climat*Tissue 1 4 0.83 0.4126 From PROC GLM, we see all the sums of squares are equivalent, and we see that there are two distinct errors (for the whole-plot level and split-plot level) and this matches the SPSS output. Dependent Variable: NI The GLM Procedure Tests of Hypotheses for Mixed Model Analysis of Variance Source DF Type III SS Mean Square F Value Pr > F DayLength 1 42.445225 42.445225 6.54 0.0629 Climate 1 18.232900 18.232900 2.81 0.1692 DayLength*Climate 1 1.537600 1.537600 0.24 0.6520 Error 4 25.979950 6.494987 Error: MS(Hamst(DayLen*Climat)) Source DF Type III SS Mean Square F Value Pr > F Tissue 1 385.729600 385.729600 133.16 0.0003 DayLength*Tissue 1 27.562500 27.562500 9.51 0.0368 Climate*Tissue 1 6.325225 6.325225 2.18 0.2136 DayLen*Climat*Tissue 1 2.418025 2.418025 0.83 0.4126 Hamst(DayLen*Climat) 4 25.979950 6.494987 2.24 0.2267 Error: MS(Error) 4 11.587350 2.89683 18

7.1.6 Comment on Residuals It looks like the residuals that you receive from SPSS are not the conditional residuals but rather the marginal residuals (what s leftover after accounting for the fixed effects). For checking the assumptions of the bottom-level noise (or σ 2 ), we want to consider the residuals after accounting for the random hamster effects, which are the conditional residuals. SPSS residuals and predicted values (only 8 predicted values in the plot): Normal Q-Q Plot Resids -2-1 0 1 2 Sample Quantiles -2-1 0 1 2 0 5 10 15 Preds -2-1 0 1 2 Theoretical Quantiles SAS residuals and predicted values (more predicted values because they include the BLUPs): 19

The marginal residuals from SAS (this doesn t check our assumptions on ɛ ijk ): 20

7.2 Using Mixed Models option 7.2.1 Format of data For this modeling, we will have the data in the same format as SAS, with one observation per row. Open the split plot hamsters format 2.sav in SPSS using File Open Data... 7.2.2 Modeling the data Choose Analyze Mixed Models Linear... and you will see the screen below to setup the subject factor. Once entered, click Continue. 21

Then choose your factors in the model (both fixed and random for now). Click on Fixed... to set-up fixed factors and press Continue. 22

Click on Random... to set-up the nested hamster effect. Choose the button Build nested terms and Include intercept. Highlight Hamster and click the down arrow. Click on (Within) and highlight DayLength and press the down arrow and then click on By* and highlight Climate and press the down arrow and then click Add then Continue. 23

Click the Save button and request to save the residuals and predicted values. This will save new columns in your present data set then click OK. 7.2.3 SPSS output for tests Type III Tests of Fixed Effects a Source Numerator df Denominator df F Sig. Intercept DayLength Climate Tissue DayLength * Climate DayLength * Tissue Climate * Tissue DayLength * Climate * Tissue a. Dependent Variable: NI. 1 4 96.498.001 1 4 6.535.063 1 4 2.807.169 1 4 133.155.000 1 4.237.652 1 4 9.515.037 1 4 2.183.214 1 4.835.413 Estimates of Covariance Parameters a Parameter Estimate Std. Error Residual Hamster(DayLength * Climate) a. Dependent Variable: NI. Variance 2.896837 2.048373 1.799075 2.514372 Resid -2-1 0 1 2 0 5 10 15 Pred And both the SPSS output above and the residuals match the SAS analysis. Page 1 24