Fuzzy cross table 1 RUNNING HEAD: FUZZY CROSS TABLE How to create a fuzzy cross table with SPSS A research note on a trick of our trade (version 2.0) Ruben P. Konig Radboud University Nijmegen Please address all your communication concerning this manuscript to Ruben P. Konig, Department of Communication, Faculty of Social Science, Radboud University Nijmegen, P.O.Box 9104, 6500 HE Nijmegen, The Netherlands. E-mail: r.konig@ru.nl. Telephone: +31-24-3615789. Please refer to this paper as: Konig, R. P. (2009). How to create a fuzzy cross table with SPSS: A research note on a trick of our trade (version 2.0) [unpublished paper]. Nijmegen, The Netherlands: Radboud University Nijmegen. Retrieved from http://rkonig.ruhosting.nl/downloads/fuzzycrosstable.pdf, <fill in date of retrieval>. Keywords: Communication research methods, Quantitative methods, Research methods, Significance testing, Statistics, Cross table, Fuzzy coding, SPSS
Fuzzy cross table 2 Abstract In this paper I discuss the need to create a cross table of fuzzy coded variables, and how this may be achieved with the help of SPSS's matrix procedure. I illustrate this with data on job and newspaper subscription from a random sample of the Dutch population. Introduction Suppose you are interested in the relationship between occupational class and newspaper subscription, and suppose that you are aware of the fact that newspaper subscription is not an individual characteristic of your respondents, but a characteristic of their households. At the same time you are aware of the fact that some households consist of only one person, but others consist of more than one person and sometimes more than one of these persons have a job and consequently a score on occupational class. Also suppose that you want to describe the relationship between occupational class and newspaper subscription with a cross table (newspaper subscription and occupational class are nominal variables). How do you proceed? To measure occupational class at household level, you cannot calculate the mean occupational class of the household members with a job, because occupational class is not an interval level variable. This is a situation where 'fuzzy coding' might bring relief (cf. Murtagh, 2005, pp. 77-92). You could count the respondent's household in part in every applicable category of occupational class. That is, if there are n people with a job in a respondent's household, you count 1/n-th of that household in the occupational class of every household member with a job. 1 As a consequence, the frequencies of the households' occupational class may not consist of whole numbers only, and the same is true of the cell counts in a cross table of the households' occupational class and newspaper subscription.
Fuzzy cross table 3 Table 1 Fictitious data to illustrate fuzzy coding of households' occupational class and newspaper subscription household's occupational class household's newspaper subscription lower upper populaity qual- working middle middle upper subscription to no occupational classes of household members class class class class newspapers paper paper paper working class 1 0 0 0 popular 0 1 0 upper middle & lower middle class 0.5.5 0 popular 0 1 0 upper middle class 0 0 1 0 quality 0 0 1 upper & upper middle class 0 0.5.5 quality 0 0 1 working class & working class 1 0 0 0 no paper 1 0 0 upper class 0 0 0 1 quality 0 0 1 upper middle & working class.5 0.5 0 quality & popular 0.5.5 lower middle, working & working class.67.33 0 0 popular 0 1 0 lower middle & upper middle class 0.5.5 0 popular 0 1 0 lower middle class 0 1 0 0 popular 0 1 0 upper middle & upper class 0 0.5.5 quality & popular 0.5.5 lower middle & lower middle class 0 1 0 0 quality 0 0 1 working, lower middle & upper middle class.33.33.33 0 quality & popular 0.5.5 upper middle & lower middle class 0.5.5 0 quality 0 0 1 lower middle class 0 1 0 0 no paper 1 0 0 et cetera... Table 1 is an illustration of fuzzy coding with some fictitious data. In the first column you see what occupational classes the household members belong to and in the second to fifth column you see how this can be coded fuzzily. Since households are not obliged to subscribe to a newspaper and are not restricted to subscribe to only one newspaper, the same principle is applied to newspaper subscription as well in the other columns. Table 2 Cross table of the fictitious data in Table 1 No paper Popular paper Quality paper Total Working class 1.000 2.083.417 3.500 Lower middle class 1.000 2.500 1.667 5.167 Upper middle class.000 1.667 2.667 4.330 Upper class.000.250 1.750 2.000 Total 2.000 6.500 6.500 15.000
Fuzzy cross table 4 Subsequently, for very small data sets a cross table can be created by hand. For the data in Table 1, that would result in Table 2. But how to create a cross table for such fuzzy coded data when you have more records and you want the computer to do it for you? In fact that is fairly simple, but your standard statistical package may not be fitted out for making it seem simple I know that SPSS is not. The columns two to five and seven to nine in Table 1 constitute two so-called indicator matrices (Kroonenberg & Greenacre, 2006, p. 5); respectively for the households' occupational class and for the households' newspaper subscription. If you transpose one of these indicator matrices and multiply it with the other, you end up with the desired cross table. And that is not all. With knowledge of some matrix algebra you can also compute statistics such as Pearson's chi-square and the likelihood ratio chi-square. SPSS syntax If you do not want to do all this by hand, the SPSS syntax in the Appendix may be helpful. It uses SPSS' matrix procedure to compute cell counts, row and column percentages, adjusted standardized residuals, Pearson's chi-square and the likelihood ratio chi-square. It is meticulously annotated with comments to make it possible for everyone to understand what it does and maybe alter it to make some extra computations. The syntax presupposes that two variables (e.g., A and B) are represented in your data as a set of variables (e.g., cata1 to cata4 and catb1 to catb3, respectively) that each represent a category, like the columns in Table 1. At the start of the matrix procedure, these variables are read as the two indicator matrices that will be processed to create the cross table and compute the statistics. To run this syntax just copy it into your own syntax, alter it to fit your specific needs (at least change the bold printed lines) and run it.
Fuzzy cross table 5 Example with real data To illustrate the usefulness of a fuzzy cross table, I used data from a survey among a national random sample of the Dutch population in the year 2000 (n = 825; for further details, see Konig, Jacobs, Hendriks Vettehen, Renckstorf & Beentjes, 2005). Among the questions asked in this survey, were questions about the respondents' employment and their newspaper subscription. I scored their job in three categories: 'no job', 'a public service job', and 'a job in a private company'. As to newspaper subscription, respondents could mention that they had no newspaper subscription, and that they subscribed to one or more newspapers. If the latter held true, the respondents were asked to mention up to two newspaper titles that they subscribed to. This resulted in two variables; first mentioned newspaper, and second mentioned newspaper. Both were coded into four categories as 'no newspaper', 'a national popular paper', 'a national quality paper', and 'a regional paper'. Now, if I want to use these data to study the relationship between people's job and their newspaper subscription, I am presented with a problem. The survey did not record why respondents mentioned one paper first and another second. Thus, I do not know whether the respondents considered one of their subscriptions to be more important, or not and if so, which one. I am therefore obliged to use the data on both newspaper subscriptions. But simply presenting two cross tables does not suffice, as is illustrated in Tables 3 and 4. Table 3 shows a significant relationship (χ 2 = 17.4, df = 6, p =.008), but Table 4 shows a nonsignificant one (χ 2 = 2.8, df = 6, p =.838), and of course, we do not know how to interpret these tables.
Fuzzy cross table 6 Table 3 Job by first mentioned newspaper subscription (percentages in brackets) No paper Popular paper Quality paper Regional paper Total No job 72 (38.1) 52 (45.2) 50 (33.8) 152 (40.8) 326 (39.5) Public service 30 (15.9) 15 (13.0) 45 (30.4) 78 (20.9) 168 (20.4) Private company 87 (46.0) 48 (41.7) 53 (35.8) 143 (38.3) 331 (40.1) Total 189 (100.0) 115 (100.0) 148 (100.0) 373 (100.0) 825 (100.0) Table 4 Job by second mentioned newspaper subscription (percentages in brackets) No paper Popular paper Quality paper Regional paper Total No job 275 (39.9) 11 (33.3) 16 (39.0) 24 (38.7) 326 (39.5) Public service 134 (19.4) 9 (27.3) 9 (22.0) 16 (25.8) 168 (20.4) Private company 280 (40.6) 13 (39.4) 16 (39.0) 22 (35.5) 331 (40.1) Total 689 (100.0) 33 (100.0) 41 (100.0) 62 (100.0) 825 (100.0) Therefore I created a fuzzy cross table as explained above. It is displayed in Table 5. This table shows a significant relationship between job and newspaper subscription (χ 2 = 13.3, df = 6, p =.038) and may be interpreted as follows. People without a job are most inclined to subscribe a popular newspaper and are least inclined to subscribe a quality newspaper. People working in a private company are most likely not to subscribe a newspaper at all, and are least likely to subscribe a quality newspaper. And finally, in contrast with the others, people with a job in public service are most likely to subscribe a quality newspaper, and least likely to subscribe a popular newspaper. Public service employees stand out for their preference for quality newspapers (adjusted standardized residual = 2.753; all other adjusted standardized residuals < 1.96). If you serve the public, if you work for the common good, maybe you are more interested in quality information about your society.
Fuzzy cross table 7 Table 5 Fuzzy cross table of job by newspaper subscription (percentages in brackets) No paper Popular paper Quality paper Regional paper Total No job 72 (38.1) 51 (44.2) 51 (34.8) 152 (40.6) 326 (39.5) Public service 30 (15.9) 16.5 (14.3) 42 (28.7) 79.5 (21.3) 168 (20.4) Private company 87 (46.0) 48 (41.6) 53.5 (36.5) 142.5 (38.1) 331 (40.1) Total 189 (100.0) 115 (100.0) 146 (100.000) 374 (100.0) 825 (100.0) Discussion In this manuscript I showed the need to sometimes create a fuzzy cross table, how to do that with the help of SPSS, and I illustrated the use of such a table with the relationship between job and newspaper subscription. The results show that public service employees in the Netherlands in the year 2000 were more inclined than other people to read quality newspapers. Footnotes 1 Here I am assuming that the occupational class of every household member is equally important in determining the occupational class of the household as a whole. That is not necessarily a good assumption, but for the sake of a simple argument in explaining fuzzy coding it suffices. Of course you can weigh the class of the different household members in any particular way that suits your needs. References Konig, R., Jacobs, E., Hendriks Vettehen, P., Renckstorf, K., & Beentjes, H. (2005). Media use in the Netherlands 2000: Documentation of a national survey. Den Haag, The Netherlands: Data Archiving and Networked Services.
Fuzzy cross table 8 Kroonenberg, P. M., & Greenacre, M. J. (2006). Correspondence analysis. In S. Kotz, C. B. Read, N. Balakrishnan, & B. Vidakovic (Eds.), Encyclopedia of Statistical Sciences [Online edition]. John Wiley. Retrieved from http://www.mrw.interscience.wiley.com/emrw/9780471667193/ess/article/ess6018/cu rrent/pdf, January 16, 2009. Murtagh, F. (2005). Correspondence analysis and data coding with Java and R. Boca Raton, FL: Chapman & Hall/CRC. Appendix comment matrix procedure to create a "fuzzy cross table". matrix. get A /*indicator matrix A*/ /file=* /variables=cata1 cata2 cata3 cata4. /* use any number of categories */ get B /*indicator matrix B*/ /file=* /variables=catb1 catb2 catb3. /* use any number of categories */ compute observed=t(a)*b. /* cross table of A and B */ compute rowsum=rsum(observed). /* column vector with row totals */ compute colsum=csum(observed). /* row vector with column totals */ compute N=rsum(colsum). /* total number of observations */ compute expected=(rowsum*colsum)&/n. /* expected frequencies */ compute chisquar=rsum(csum(((observed-expected)&**2)&/expected)). /* Pearson's chi-square */ compute df=(ncol(observed)-1)*(nrow(observed)-1). /* degrees of freedom */ compute as=make(nrow(observed),ncol(observed),0). /* preparation for computation of adjusted standardized residual (asresid) */ compute rowperc=make(nrow(observed),ncol(observed),0). /* preparation for computation of row percentages*/ compute colperc=make(nrow(observed),ncol(observed),0). /* preparation for computation of column percentages*/ compute tosmall=0. /* preparation for counting cells with expected frequency < 5 */ compute G2=0. /* preparation for computation of likelihood ratio chi-square */ loop i=1 to nrow(expected) by 1. /* process the rows one by one */ + loop j=1 to ncol(expected) by 1. /* within row, process the columns one by one */ + compute as(i,j)=((rowsum(i)*colsum(j)*(1-(rowsum(i)/n))*(1-(colsum(j)/n)))/n)&**.5. + /* part of computation of adjusted standardized residual */ + do if expected(i,j)<5. + compute tosmall=tosmall+1. /* count cells with expected frequencies < 5 */ + end if. + do if observed(i,j) <> 0. /* if cell empty, ln(observed/expected) not defined, */ + /* but lim n->0 n*ln(n) = 0, so the contribution of this cell is 0 */ + compute G2=G2+(2*observed(i,j)*ln(observed(i,j)/expected(i,j))). + /* cell contribution to likelihood ratio chi-square */ + compute rowperc(i,j)=100*observed(i,j)/rowsum(i). /* row percentage */ + compute colperc(i,j)=100*observed(i,j)/colsum(j). /* column percentage */ + end if. + end loop. end loop. compute asresid=(observed-expected)&/as. /* adjusted standardized residual */ print {observed,rowsum;colsum,n} /* display the results */
Fuzzy cross table 9 /title="crosstable of (fuzzy coded) variables A and B" /clabels="catb1" "catb2" "catb3" "total" /* choose category labels */ /rlabel="cata1" "cata2" "cata3" "cata4" "total". /* for columns and rows */ print {chisquar,df,(1-chicdf(chisquar,df));g2,df,(1-chicdf(g2,df))} /title="chi-square-test of statistical independence" /clabels="chi2" "df" "p" /rlabels="pearson" "likelihood ratio". print {tosmall,(tosmall*100)/(nrow(expected)*ncol(expected))} /title="number of cells with expected frequencies < 5" /clabels="count" "%". print {cmin(rmin(expected))} /title="smallest expected frequency". print {rowperc,make(nrow(observed),1,100)} /title="row percentages" /clabels="catb1" "catb2" "catb3" "total" /* choose category labels */ /rlabel="cata1" "cata2" "cata3" "cata4". /* for columns and rows */ print {colperc;make(1,ncol(observed),100)} /title="column percentages" /clabels="catb1" "catb2" "catb3" /* choose category labels */ /rlabel="cata1" "cata2" "cata3" "cata4" "total". /* for columns and rows */ print asresid /title="adjusted standardized residuals" /clabels="catb1" "catb2" "catb3" /* choose category labels */ /rlabel="cata1" "cata2" "cata3" "cata4". /* for columns and rows */ end matrix.