How to create a fuzzy cross table with SPSS. A research note on a trick of our trade. (version 2.0) Ruben P. Konig. Radboud University Nijmegen

Similar documents
Correctly Compute Complex Samples Statistics

- 1 - Fig. A5.1 Missing value analysis dialog box

Crosstabs Notes Output Created 17-Mai :40:54 Comments Input

Analysis of Complex Survey Data with SAS

NON-CALCULATOR ARITHMETIC

Feature Selection Using Modified-MCA Based Scoring Metric for Classification

Data analysis using Microsoft Excel

SPSS Basics for Probability Distributions

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Statistical Package for the Social Sciences INTRODUCTION TO SPSS SPSS for Windows Version 16.0: Its first version in 1968 In 1975.

IBM SPSS Categories 23

MICROSOFT EXCEL BASIC FORMATTING

4. Descriptive Statistics: Measures of Variability and Central Tendency

Missing Data: What Are You Missing?

Math background. 2D Geometric Transformations. Implicit representations. Explicit representations. Read: CS 4620 Lecture 6

[/TTEST [PERCENT={5}] [{T }] [{DF } [{PROB }] [{COUNTS }] [{MEANS }]] {n} {NOT} {NODF} {NOPROB}] {NOCOUNTS} {NOMEANS}

Data Dictionary. National Center for O*NET Development

Chapter 3. Finding Sums. This chapter covers procedures for obtaining many of the summed values that are

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

Subset Selection in Multiple Regression

Nuts and Bolts Research Methods Symposium

Operators. Java operators are classified into three categories:

11. Chi Square. Calculate Chi Square for contingency tables. A Chi Square is used to analyze categorical data. It compares observed

Updates and Errata for Statistical Data Analytics (1st edition, 2015)

An introduction to SPSS

Product Catalog. AcaStat. Software

Introduction to SPSS on the Macintosh. Scott Patterson,Ph.D. Broadcast and Electronic Communication Arts San Francisco State University.

Data Analysis using SPSS

Dual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys

Fractional. Design of Experiments. Overview. Scenario

Correctly Compute Complex Samples Statistics

SPSS TRAINING SPSS VIEWS

Source df SS MS F A a-1 [A] [T] SS A. / MS S/A S/A (a)(n-1) [AS] [A] SS S/A. / MS BxS/A A x B (a-1)(b-1) [AB] [A] [B] + [T] SS AxB

CSE 547: Machine Learning for Big Data Spring Problem Set 2. Please read the homework submission policies.

A. Petr and a calendar

Applied Regression Modeling: A Business Approach

2016 SPSS Workshop UBC Research Commons

Fact Sheet No.1 MERLIN

Matrices. Chapter Matrix A Mathematical Definition Matrix Dimensions and Notation

WORKSHOP: Using the Health Survey for England, 2014

IMPORTANT. Registration Settings: SERIAL NUMBER: COMPUTER ID: REGISTRATION NUMBER:

Approximate Reasoning with Fuzzy Booleans

Describe the two most important ways in which subspaces of F D arise. (These ways were given as the motivation for looking at subspaces.

INTRODUCTION...3 OVERALL INTERNET USAGE...3 PRICING...7 CONCLUSION...10

OUTLINES. Variable names in MATLAB. Matrices, Vectors and Scalar. Entering a vector Colon operator ( : ) Mathematical operations on vectors.

LIST OF TABLES. Page Title No.

Exercise Set Decide whether each matrix below is an elementary matrix. (a) (b) (c) (d) Answer:

Binary IFA-IRT Models in Mplus version 7.11

Data Dictionary. National Center for O*NET Development

Generalized Additive Model

Frequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values

Discussion 2C Notes (Week 3, January 21) TA: Brian Choi Section Webpage:

Tutorial #1: Using Latent GOLD choice to Estimate Discrete Choice Models

A Simple Guide to Using SPSS (Statistical Package for the. Introduction. Steps for Analyzing Data. Social Sciences) for Windows

Chapter 2. Basic Operations. you through the routine procedures that you will use nearly every time you work with SPSS.

Today Function. Note: If you want to retrieve the date and time that the computer is set to, use the =NOW() function.

SOST 201 September 20, Stem-and-leaf display 2. Miscellaneous issues class limits, rounding, and interval width.

Package chi2x3way. R topics documented: January 23, Type Package

Table of Contents (As covered from textbook)

Multivariate Normal Random Numbers

DOWNLOAD PDF MICROSOFT EXCEL ALL FORMULAS LIST WITH EXAMPLES

Shuli s Math Problem Solving Column

GATHERING SENSITIVE HEALTHCARE INFORMATION USING SOCIAL ENGINEERING TECHNIQUES

H. W. Kuhn. Bryn Mawr College

Bootstrapped and Means Trimmed One Way ANOVA and Multiple Comparisons in R

Frequency, proportional, and percentage distributions.

Brief Guide on Using SPSS 10.0

STATISTICS (STAT) Statistics (STAT) 1

COSC2430: Programming and Data Structures Sparse Matrix Multiplication: Search and Sorting Algorithms

Lecture 5. Logic I. Statement Logic

ANALYSIS OF NOMINAL DATA MULTI-WAY CONTINGENCY TABLE

Creating and using Moodle Rubrics

Generalized least squares (GLS) estimates of the level-2 coefficients,

Basic concepts and terms

Package influence.sem

ANSWERS -- Prep for Psyc350 Laboratory Final Statistics Part Prep a

Ch6: The Normal Distribution

Choosing the Right Procedure

Introduction (SPSS) Opening SPSS Start All Programs SPSS Inc SPSS 21. SPSS Menus

Choosing the Right Procedure

Enter your UID and password. Make sure you have popups allowed for this site.

Hierarchical Mixture Models for Nested Data Structures

A Beginner's Guide to. Randall E. Schumacker. The University of Alabama. Richard G. Lomax. The Ohio State University. Routledge

Creating a data file and entering data

FUZZY INFERENCE SYSTEMS

CHIIAB - a "poor nan's" shortcut to computer processing of linguistic data

Create your first workbook

I. This material refers to mathematical methods designed for facilitating calculations in matrix

A gentle introduction to Matlab

Introduction to Operations Research Prof. G. Srinivasan Department of Management Studies Indian Institute of Technology, Madras

Data Mining. ❷Chapter 2 Basic Statistics. Asso.Prof.Dr. Xiao-dong Zhu. Business School, University of Shanghai for Science & Technology

How to Use a Statistical Package

Introductory Applied Statistics: A Variable Approach TI Manual

Year 7 Set 1 : Unit 1 : Number 1. Learning Objectives: Level 5

STEPS FOR USING TURNING POINT:

Multi Attribute Decision Making Approach for Solving Intuitionistic Fuzzy Soft Matrix

Chapter2. Current Status of Information and Communications

Introduction to SPSS

Learning and Development. UWE Staff Profiles (USP) User Guide

2D/3D Geometric Transformations and Scene Graphs

Transcription:

Fuzzy cross table 1 RUNNING HEAD: FUZZY CROSS TABLE How to create a fuzzy cross table with SPSS A research note on a trick of our trade (version 2.0) Ruben P. Konig Radboud University Nijmegen Please address all your communication concerning this manuscript to Ruben P. Konig, Department of Communication, Faculty of Social Science, Radboud University Nijmegen, P.O.Box 9104, 6500 HE Nijmegen, The Netherlands. E-mail: r.konig@ru.nl. Telephone: +31-24-3615789. Please refer to this paper as: Konig, R. P. (2009). How to create a fuzzy cross table with SPSS: A research note on a trick of our trade (version 2.0) [unpublished paper]. Nijmegen, The Netherlands: Radboud University Nijmegen. Retrieved from http://rkonig.ruhosting.nl/downloads/fuzzycrosstable.pdf, <fill in date of retrieval>. Keywords: Communication research methods, Quantitative methods, Research methods, Significance testing, Statistics, Cross table, Fuzzy coding, SPSS

Fuzzy cross table 2 Abstract In this paper I discuss the need to create a cross table of fuzzy coded variables, and how this may be achieved with the help of SPSS's matrix procedure. I illustrate this with data on job and newspaper subscription from a random sample of the Dutch population. Introduction Suppose you are interested in the relationship between occupational class and newspaper subscription, and suppose that you are aware of the fact that newspaper subscription is not an individual characteristic of your respondents, but a characteristic of their households. At the same time you are aware of the fact that some households consist of only one person, but others consist of more than one person and sometimes more than one of these persons have a job and consequently a score on occupational class. Also suppose that you want to describe the relationship between occupational class and newspaper subscription with a cross table (newspaper subscription and occupational class are nominal variables). How do you proceed? To measure occupational class at household level, you cannot calculate the mean occupational class of the household members with a job, because occupational class is not an interval level variable. This is a situation where 'fuzzy coding' might bring relief (cf. Murtagh, 2005, pp. 77-92). You could count the respondent's household in part in every applicable category of occupational class. That is, if there are n people with a job in a respondent's household, you count 1/n-th of that household in the occupational class of every household member with a job. 1 As a consequence, the frequencies of the households' occupational class may not consist of whole numbers only, and the same is true of the cell counts in a cross table of the households' occupational class and newspaper subscription.

Fuzzy cross table 3 Table 1 Fictitious data to illustrate fuzzy coding of households' occupational class and newspaper subscription household's occupational class household's newspaper subscription lower upper populaity qual- working middle middle upper subscription to no occupational classes of household members class class class class newspapers paper paper paper working class 1 0 0 0 popular 0 1 0 upper middle & lower middle class 0.5.5 0 popular 0 1 0 upper middle class 0 0 1 0 quality 0 0 1 upper & upper middle class 0 0.5.5 quality 0 0 1 working class & working class 1 0 0 0 no paper 1 0 0 upper class 0 0 0 1 quality 0 0 1 upper middle & working class.5 0.5 0 quality & popular 0.5.5 lower middle, working & working class.67.33 0 0 popular 0 1 0 lower middle & upper middle class 0.5.5 0 popular 0 1 0 lower middle class 0 1 0 0 popular 0 1 0 upper middle & upper class 0 0.5.5 quality & popular 0.5.5 lower middle & lower middle class 0 1 0 0 quality 0 0 1 working, lower middle & upper middle class.33.33.33 0 quality & popular 0.5.5 upper middle & lower middle class 0.5.5 0 quality 0 0 1 lower middle class 0 1 0 0 no paper 1 0 0 et cetera... Table 1 is an illustration of fuzzy coding with some fictitious data. In the first column you see what occupational classes the household members belong to and in the second to fifth column you see how this can be coded fuzzily. Since households are not obliged to subscribe to a newspaper and are not restricted to subscribe to only one newspaper, the same principle is applied to newspaper subscription as well in the other columns. Table 2 Cross table of the fictitious data in Table 1 No paper Popular paper Quality paper Total Working class 1.000 2.083.417 3.500 Lower middle class 1.000 2.500 1.667 5.167 Upper middle class.000 1.667 2.667 4.330 Upper class.000.250 1.750 2.000 Total 2.000 6.500 6.500 15.000

Fuzzy cross table 4 Subsequently, for very small data sets a cross table can be created by hand. For the data in Table 1, that would result in Table 2. But how to create a cross table for such fuzzy coded data when you have more records and you want the computer to do it for you? In fact that is fairly simple, but your standard statistical package may not be fitted out for making it seem simple I know that SPSS is not. The columns two to five and seven to nine in Table 1 constitute two so-called indicator matrices (Kroonenberg & Greenacre, 2006, p. 5); respectively for the households' occupational class and for the households' newspaper subscription. If you transpose one of these indicator matrices and multiply it with the other, you end up with the desired cross table. And that is not all. With knowledge of some matrix algebra you can also compute statistics such as Pearson's chi-square and the likelihood ratio chi-square. SPSS syntax If you do not want to do all this by hand, the SPSS syntax in the Appendix may be helpful. It uses SPSS' matrix procedure to compute cell counts, row and column percentages, adjusted standardized residuals, Pearson's chi-square and the likelihood ratio chi-square. It is meticulously annotated with comments to make it possible for everyone to understand what it does and maybe alter it to make some extra computations. The syntax presupposes that two variables (e.g., A and B) are represented in your data as a set of variables (e.g., cata1 to cata4 and catb1 to catb3, respectively) that each represent a category, like the columns in Table 1. At the start of the matrix procedure, these variables are read as the two indicator matrices that will be processed to create the cross table and compute the statistics. To run this syntax just copy it into your own syntax, alter it to fit your specific needs (at least change the bold printed lines) and run it.

Fuzzy cross table 5 Example with real data To illustrate the usefulness of a fuzzy cross table, I used data from a survey among a national random sample of the Dutch population in the year 2000 (n = 825; for further details, see Konig, Jacobs, Hendriks Vettehen, Renckstorf & Beentjes, 2005). Among the questions asked in this survey, were questions about the respondents' employment and their newspaper subscription. I scored their job in three categories: 'no job', 'a public service job', and 'a job in a private company'. As to newspaper subscription, respondents could mention that they had no newspaper subscription, and that they subscribed to one or more newspapers. If the latter held true, the respondents were asked to mention up to two newspaper titles that they subscribed to. This resulted in two variables; first mentioned newspaper, and second mentioned newspaper. Both were coded into four categories as 'no newspaper', 'a national popular paper', 'a national quality paper', and 'a regional paper'. Now, if I want to use these data to study the relationship between people's job and their newspaper subscription, I am presented with a problem. The survey did not record why respondents mentioned one paper first and another second. Thus, I do not know whether the respondents considered one of their subscriptions to be more important, or not and if so, which one. I am therefore obliged to use the data on both newspaper subscriptions. But simply presenting two cross tables does not suffice, as is illustrated in Tables 3 and 4. Table 3 shows a significant relationship (χ 2 = 17.4, df = 6, p =.008), but Table 4 shows a nonsignificant one (χ 2 = 2.8, df = 6, p =.838), and of course, we do not know how to interpret these tables.

Fuzzy cross table 6 Table 3 Job by first mentioned newspaper subscription (percentages in brackets) No paper Popular paper Quality paper Regional paper Total No job 72 (38.1) 52 (45.2) 50 (33.8) 152 (40.8) 326 (39.5) Public service 30 (15.9) 15 (13.0) 45 (30.4) 78 (20.9) 168 (20.4) Private company 87 (46.0) 48 (41.7) 53 (35.8) 143 (38.3) 331 (40.1) Total 189 (100.0) 115 (100.0) 148 (100.0) 373 (100.0) 825 (100.0) Table 4 Job by second mentioned newspaper subscription (percentages in brackets) No paper Popular paper Quality paper Regional paper Total No job 275 (39.9) 11 (33.3) 16 (39.0) 24 (38.7) 326 (39.5) Public service 134 (19.4) 9 (27.3) 9 (22.0) 16 (25.8) 168 (20.4) Private company 280 (40.6) 13 (39.4) 16 (39.0) 22 (35.5) 331 (40.1) Total 689 (100.0) 33 (100.0) 41 (100.0) 62 (100.0) 825 (100.0) Therefore I created a fuzzy cross table as explained above. It is displayed in Table 5. This table shows a significant relationship between job and newspaper subscription (χ 2 = 13.3, df = 6, p =.038) and may be interpreted as follows. People without a job are most inclined to subscribe a popular newspaper and are least inclined to subscribe a quality newspaper. People working in a private company are most likely not to subscribe a newspaper at all, and are least likely to subscribe a quality newspaper. And finally, in contrast with the others, people with a job in public service are most likely to subscribe a quality newspaper, and least likely to subscribe a popular newspaper. Public service employees stand out for their preference for quality newspapers (adjusted standardized residual = 2.753; all other adjusted standardized residuals < 1.96). If you serve the public, if you work for the common good, maybe you are more interested in quality information about your society.

Fuzzy cross table 7 Table 5 Fuzzy cross table of job by newspaper subscription (percentages in brackets) No paper Popular paper Quality paper Regional paper Total No job 72 (38.1) 51 (44.2) 51 (34.8) 152 (40.6) 326 (39.5) Public service 30 (15.9) 16.5 (14.3) 42 (28.7) 79.5 (21.3) 168 (20.4) Private company 87 (46.0) 48 (41.6) 53.5 (36.5) 142.5 (38.1) 331 (40.1) Total 189 (100.0) 115 (100.0) 146 (100.000) 374 (100.0) 825 (100.0) Discussion In this manuscript I showed the need to sometimes create a fuzzy cross table, how to do that with the help of SPSS, and I illustrated the use of such a table with the relationship between job and newspaper subscription. The results show that public service employees in the Netherlands in the year 2000 were more inclined than other people to read quality newspapers. Footnotes 1 Here I am assuming that the occupational class of every household member is equally important in determining the occupational class of the household as a whole. That is not necessarily a good assumption, but for the sake of a simple argument in explaining fuzzy coding it suffices. Of course you can weigh the class of the different household members in any particular way that suits your needs. References Konig, R., Jacobs, E., Hendriks Vettehen, P., Renckstorf, K., & Beentjes, H. (2005). Media use in the Netherlands 2000: Documentation of a national survey. Den Haag, The Netherlands: Data Archiving and Networked Services.

Fuzzy cross table 8 Kroonenberg, P. M., & Greenacre, M. J. (2006). Correspondence analysis. In S. Kotz, C. B. Read, N. Balakrishnan, & B. Vidakovic (Eds.), Encyclopedia of Statistical Sciences [Online edition]. John Wiley. Retrieved from http://www.mrw.interscience.wiley.com/emrw/9780471667193/ess/article/ess6018/cu rrent/pdf, January 16, 2009. Murtagh, F. (2005). Correspondence analysis and data coding with Java and R. Boca Raton, FL: Chapman & Hall/CRC. Appendix comment matrix procedure to create a "fuzzy cross table". matrix. get A /*indicator matrix A*/ /file=* /variables=cata1 cata2 cata3 cata4. /* use any number of categories */ get B /*indicator matrix B*/ /file=* /variables=catb1 catb2 catb3. /* use any number of categories */ compute observed=t(a)*b. /* cross table of A and B */ compute rowsum=rsum(observed). /* column vector with row totals */ compute colsum=csum(observed). /* row vector with column totals */ compute N=rsum(colsum). /* total number of observations */ compute expected=(rowsum*colsum)&/n. /* expected frequencies */ compute chisquar=rsum(csum(((observed-expected)&**2)&/expected)). /* Pearson's chi-square */ compute df=(ncol(observed)-1)*(nrow(observed)-1). /* degrees of freedom */ compute as=make(nrow(observed),ncol(observed),0). /* preparation for computation of adjusted standardized residual (asresid) */ compute rowperc=make(nrow(observed),ncol(observed),0). /* preparation for computation of row percentages*/ compute colperc=make(nrow(observed),ncol(observed),0). /* preparation for computation of column percentages*/ compute tosmall=0. /* preparation for counting cells with expected frequency < 5 */ compute G2=0. /* preparation for computation of likelihood ratio chi-square */ loop i=1 to nrow(expected) by 1. /* process the rows one by one */ + loop j=1 to ncol(expected) by 1. /* within row, process the columns one by one */ + compute as(i,j)=((rowsum(i)*colsum(j)*(1-(rowsum(i)/n))*(1-(colsum(j)/n)))/n)&**.5. + /* part of computation of adjusted standardized residual */ + do if expected(i,j)<5. + compute tosmall=tosmall+1. /* count cells with expected frequencies < 5 */ + end if. + do if observed(i,j) <> 0. /* if cell empty, ln(observed/expected) not defined, */ + /* but lim n->0 n*ln(n) = 0, so the contribution of this cell is 0 */ + compute G2=G2+(2*observed(i,j)*ln(observed(i,j)/expected(i,j))). + /* cell contribution to likelihood ratio chi-square */ + compute rowperc(i,j)=100*observed(i,j)/rowsum(i). /* row percentage */ + compute colperc(i,j)=100*observed(i,j)/colsum(j). /* column percentage */ + end if. + end loop. end loop. compute asresid=(observed-expected)&/as. /* adjusted standardized residual */ print {observed,rowsum;colsum,n} /* display the results */

Fuzzy cross table 9 /title="crosstable of (fuzzy coded) variables A and B" /clabels="catb1" "catb2" "catb3" "total" /* choose category labels */ /rlabel="cata1" "cata2" "cata3" "cata4" "total". /* for columns and rows */ print {chisquar,df,(1-chicdf(chisquar,df));g2,df,(1-chicdf(g2,df))} /title="chi-square-test of statistical independence" /clabels="chi2" "df" "p" /rlabels="pearson" "likelihood ratio". print {tosmall,(tosmall*100)/(nrow(expected)*ncol(expected))} /title="number of cells with expected frequencies < 5" /clabels="count" "%". print {cmin(rmin(expected))} /title="smallest expected frequency". print {rowperc,make(nrow(observed),1,100)} /title="row percentages" /clabels="catb1" "catb2" "catb3" "total" /* choose category labels */ /rlabel="cata1" "cata2" "cata3" "cata4". /* for columns and rows */ print {colperc;make(1,ncol(observed),100)} /title="column percentages" /clabels="catb1" "catb2" "catb3" /* choose category labels */ /rlabel="cata1" "cata2" "cata3" "cata4" "total". /* for columns and rows */ print asresid /title="adjusted standardized residuals" /clabels="catb1" "catb2" "catb3" /* choose category labels */ /rlabel="cata1" "cata2" "cata3" "cata4". /* for columns and rows */ end matrix.