An Example of Using inter5.exe to Obtain the Graph of an Interaction This example covers the general use of inter5.exe to produce data from values inserted into a regression equation which can then be plotted in SPSS. It also covers some particular points how to handle an interaction between a numeric variable and a categorical variable when the latter has more than two categories, and the difference between using numeric variables when they have and haven't been centred at the mean. The example is based on bank3.sav, a modified version of the classic SPSS dataset bank.sav. The dependent variable is salary, in let's say, 1000's of dollars, a categorical version of years of education, and age in years. edrec Valid 0 8-12 1 13-15 2 16 + Total Cumulative Frequency Percent Valid Percent Percent 243 51.3 51.3 51.3 122 25.7 25.7 77.0 109 23.0 23.0 100.0 474 100.0 100.0 Descriptive Statistics age Age of employee work Work experience Valid N (listwise) N Minimum Maximum Mean Std. Deviation 474 23.00 64.50 37.1861 11.78724 474.00 39.67 7.9886 8.71541 474 The question is whether the relationship between age and salary is the same for all the education groups. The first analysis will be carried out without centring age. Age Uncentred glm salary by edrec with age work/ print=parameters/ design=work age edrec age*edrec.
-2- Dependent Variable: salary Source Corrected Model Intercept work age edrec edrec * age Error Total Corrected Total Tests of Between-Subjects Effects Type III Sum of Squares df Mean Square F Sig. 17233.780 a 6 2872.297 127.347.000 25185.237 1 25185.237 1116.620.000 677.522 1 677.522 30.039.000 35.119 1 35.119 1.557.213 324.335 2 162.168 7.190.001 208.184 2 104.092 4.615.010 10533.135 467 22.555 788878.989 474 27766.915 473 a. R Squared =.621 (Adjusted R Squared =.616) The results in this table show a significant interaction between edrec and age. Parameter Estimates Dependent Variable: salary 95% Confidence Interval Parameter B Std. Error t Sig. Lower Bound Upper Bound Intercept 45.375 2.675 16.963.000 40.119 50.631 work.236.043 5.481.000.151.320 age.101.079 1.279.202 -.054.256 [edrec=0] -7.744 2.665-2.906.004-12.982-2.507 [edrec=1] -2.318 3.004 -.772.441-8.221 3.584 [edrec=2] 0 a..... [edrec=0] * age -.205.074-2.789.005 -.350 -.061 [edrec=1] * age -.249.084-2.953.003 -.414 -.083 [edrec=2] * age 0 a..... a. This parameter is set to zero because it is redundant. The estimate for age in this table shows that there is a positive relationship (.101) between age and salary for the highest edrec group (which GLM has made the reference category). The estimate for the interaction between the lowest edrec category, which we'll call lowed, and age is.205 less than.101, i.e., -.104. Similarly, the estimate for the middle education category, meded, is.101 -.249 = -.148. The p-values for these estimates (.005 and.003) show that both slopes are significantly different from the slope for the highest education group. We could rerun the analysis with different reference categories for edrec, in order to see if the slopes for the two lowest categories are significantly different from zero, but our interest here is in producing a graph of the interaction.
-3- Having started up inter5.exe, we can enter the coefficients from the parameter estimates table as follows. The entered numbers or names are shown in bold. A blank beside a prompt indicates that Enter was pressed without a number being entered. Because work isn't involved in the interaction, we'll enter it at this stage and ask that the mean age be entered as the value when the equation is evaluated: Now begin entering the data... Number of variables not involved in any interaction... i.e., in the model as main effects only [default=0]? 1 Coefficient for variable 1?.236 Value for variable 1? 8 The constant (intercept) and the coefficients for age, the two dummy variables for edrec, and the two interaction terms are now entered. Entering the first model: In the following, enter zero (or just press Enter) if the variable or term is not included in the model. There must be at least one variable (variable a). Enter the constant [default: 0.0000]? 45.375 variable(a): b1 [default: 0.0000]?.101 variable(b): b2 [default: 0.0000]? -7.744 variable(c): b3 [default: 0.0000]? -2.318 variable(d): b4 [default: 0.0000]? There isn't a fourth variable involved in the interaction, so nothing is entered here. axb: b5 [default: 0.0000]? -.205 axc: b6 [default: 0.0000]? -.249 bxc: b8 [default: 0.0000]? Same here axbxc: b11 [default: 0.0000]? and here
-4- The high and low values which the program will insert in the equation are now entered. low & high of a? (enter together, separated by a comma)... 25,49 The approximate mean of age (37) minus and plus the approximate SD (12) low & high of b? (enter together, separated by a comma)... 0,1 low & high of c? (enter together, separated by a comma)... 0,1 Note that if 0 and 1 are entered as the low and high, inter5.exe doesn't calculate the mean of them, as it does with numeric variables, and uses them as the codes in the SPSS file, rather than -1 and 1. The program shows you the equation that you entered, so you can check it, and saves it for future reference. The equation is: 45.375 +.101a - 7.744b - 2.318c -.205ab -.249ac + 0bc The model has been saved in file c:\qb45\mintdem1.txt The program asks for the names to use in the file of predicted values which it writes out. Name of variable a [default: A]? age Name of variable b [default: B]? lowed Name of variable c [default: C]? meded Name of dependent variable [default: Y]? salary The current file has been saved as c:\qb45\intdemo1.txt The file intdemo1.txt looks like this:
-5- model analysis age lowed meded A_value B_value C_value salary 1 1-1 0 0 25.0000 0.0000 0.0000 49.7880 1 1-1 0 1 25.0000 0.0000 1.0000 41.2450 1 1-1 1 0 25.0000 1.0000 0.0000 36.9190 1 1-1 1 1 25.0000 1.0000 1.0000 28.3760 1 1 0 0 0 37.0000 0.0000 0.0000 51.0000 1 1 0 0 1 37.0000 0.0000 1.0000 39.4690 1 1 0 1 0 37.0000 1.0000 0.0000 35.6710 1 1 0 1 1 37.0000 1.0000 1.0000 24.1400 1 1 1 0 0 49.0000 0.0000 0.0000 52.2120 1 1 1 0 1 49.0000 0.0000 1.0000 37.6930 1 1 1 1 0 49.0000 1.0000 0.0000 34.4230 1 1 1 1 1 49.0000 1.0000 1.0000 19.9040 Notice that that inter5.exe, in its ignorance of the fact that lowed and meded together represent a three-category variable, and its keeness to provide predicted values for all combinations of the values of the variables, has produced predicted values for cases which have 1 for both variables, i.e., are in both groups, which obviously can't be allowed in an independent groups design. This is a problem which we will fix in SPSS. Notice also that, in order to specify the three levels of edrec, we would need to use both lowed and meded, when we would like to specify just one variable in the graph. This is another thing we'll attend to in SPSS. The Text Input Wizard is used to read the data into SPSS: We need to remove the cases which have one for both lowed and meded, and create a edrec variable for the plot. This syntax does the job.
-6- * Get rid of the impossible combinations produced by inter5.exe (i.e., cases which are in both the low and medium education groups). select if (not(lowed eq 1 and meded eq 1)). * Produce an edrec variable from the combinations of the dummy variable values for lowed and meded (i.e., 0,0; 0,1; 10). compute edrec=lowed*10 + meded. recode edrec (0=3)(1=2)(10=1). * Be careful with the recode. * Remember that lowed=0 and meded=0 are the codes that subjects in the highest education group will have. print format edrec (f1). value labels edrec 1 '8-12 yrs' 2 '13-15 yrs' 3 '16+ yrs'. execute. The dataset now looks like this (the model and analysis variables are not shown): The graph can now be produced:
-7- As seen from the equation, the slope of the relationship between age and salary is positive for the subjects in the highest edrec group, and negative for the other two groups. It is also noteworthy that the salary is much higher for subjects in the highest education category, regardless of age.
-8- Age centred In this analysis, both age and work are centred at the mean. The reason for centring age is so that the effects of edrec are shown at the mean of age, which is now zero, rather than at zero years of age, which could give nonsensical results. There is less reason to centre work but, when it is centred, the intercept shows the salary at the the mean of age and work (and the reference category of edrec, which is the highest group), so it is more meaningful. compute age=age - 37.1861. compute work=work - 7.9886. glm salary by edrec with age work/ print=parameters/ design=work age edrec age*edrec. In entering the information into inter5.exe, zero is given as the mean of work: Number of variables not involved in any interaction... i.e., in the model as main effects only [default=0]? 1 Coefficient for variable 1?.236 Value for variable 1? 0 The coefficients are entered in the same way as before: Enter the constant [default: 0.0000]? 51.006 variable(a): b1 [default: 0.0000]?.101 variable(b): b2 [default: 0.0000]? -15.373 variable(c): b3 [default: 0.0000]? -11.571 axb: b5 [default: 0.0000]? -.205 axc: b6 [default: 0.0000]? -.249
-9- When entering the low and high values for age, remember than the mean is now zero, so the mean minus and plus the SD is -12 and 12. low & high of a? (enter together, separated by a comma)... -12,12 low & high of b? (enter together, separated by a comma)... 0,1 low & high of c? (enter together, separated by a comma)... 0,1 The resulting text file is read into SPSS as before, and the SPSS commands are run to prepare the dataset for the graph, so that it looks like this: Give or take a bit of rounding error, the values are the same as before, and the graph looks the same:
-10- Once you have the graph in SPSS, you can edit it to get rid of unnecessary decimal places, alter axis labels, add titles, and alter the line types and weight, etc. You can also attach other value labels to the data file so they will be incorporated in the graph. Alan Taylor Department of Psychology 8th September 2006