rasch: A SAS macro for fitting polytomous Rasch models

Size: px

Start display at page:

Download "rasch: A SAS macro for fitting polytomous Rasch models"

Charity Whitehead
5 years ago
Views:

1 rasch: A SAS macro for fitting polytomous Rasch models Karl Bang Christensen National Institute of Occupational Health, Denmark and Department of Biostatistics, University of Copenhagen May 27, 2004 Abstract This note describes a SAS macro that can be used to estimate the item parameters of a polytomous Rasch model using conditional maximum likelihood (CML). 1 Introduction Rasch models (Rasch, 1960; Fischer & Molenaar, 1995) are widely used for modeling of latent variables measured using ordinal scales. Traditionally, specialized software has been needed and this have limited the use of Rasch models. This note describes a SAS macros rasch for conditional maximum likelihood (CML) estimation and a graphical test of fit of polytomous Rasch models. The use of the macro is illustrated using a data set test containing five items ITEM1, ITEM2, ITEM3, ITEM4, and ITEM5 scored 0, 1, 2 and binary covariates X, Y, and Z. The macro creates output files in the form of SAS data sets and write results to the log file. An overview of input and output files is given in Table 1. 1

2 Table 1: Input and output data sets used by the the SAS macro latreg. Macro Input Comments Output Comments rasch test contains out par estimates ˆβ ih, cf. (3) items out logl log likelihood value out regr contains sum score t and log(ˆγ) s lg0-lgt 2 CML estimation An important feature of Rasch models is the sufficiency of the raw score making consistent estimation of item parameters without reference to the distribution of the latent variable in the population possible. Generalized linear models are standard statistical models (Nelder & Wedderburn, 1972; McCullagh & Nelder, 1989). CML estimation in the dichotomous and polytomous Rasch model can be done using standard software for fitting generalized linear models, (Tjur, 1982; Kelderman, 1984, 1992; Agresti, 1993). Assume that response categories are scored 0, 1,..., H and let a persons answers to the items i = 1,..., I be represented by the indicators x = (x ih ) i=1,...,i,h=1,...,h, where x ih = 1 if the person gives the answer h to item i and 0 otherwise. The conditional probability given the person parameter θ is p(x θ) = exp([ i h x ihh]θ + i h x ihη ih ) i l exp(lθ + η (1) il) and the likelihood function is the product of these. The model is identified if i : η i0 = 0 i η im = 0. (2) These constraints are identical to the ones proposed for the Partial Credit model (Masters, 1982) using the so-called threshold parameters β ih = (η ih η i,h 1 ) (3) 2

3 for i = 1,..., I and h = 1,..., H. The conditional likelihood function is the product of the conditional probabilities given the score t = i h x ihh. These can be written p(x t) = exp( i h x ihη ih ) (4) γ t where γ t = γ t (η) = (t) x exp( i η ix i ) are well-known gamma-polynomials, ( (t) x is short notation for the sum over all response vectors x with sum t). To estimate the item parameters the data is transformed to a m k contingency table (N x ) x X, where X is the set of all possible responses. The expected numbers are EN x = N t p(x t) and this yields a generalized linear model log(en x ) = N t γ t i x ih η ih (5) h it can be shown that the likelihood equations in this model are the CML equations (Agresti, 1993). The resulting parameter estimates must be standardized according to (2), i.e. η ih η ih h km i η im. The macro rasch that calculates these estimates is called by writing %rasch(data=test, items=item1-item3, max=2, out=rasch, write=yes, fittest=yes); specifying that the data set test contains the items ITEM1, ITEM2, and ITEM3 scored 0, 1, 2. The sample size, the number of items and the number of response categories is written to the log file, followed by the estimated item parameters, estimated values of gamma polynomials, and the maximum value of the log likelihood function (this option can be switched off by specifying write=no). 3

4 2.1 Options The arguments are summarized in Table 2. Table 2: Arguments to the SAS macro rasch. Argument Comments data= Input data set containing the items items= Item names, written as V1 V2 V3 V4 (statements like V1-V4 can be used) max= Maximum value of items (minimum value is assumed to be 0, items assumed to have same number of response categories) out= First part of name given to output files (default value RASCH) write= If YES (the default) information is written to the log file fittest= If YES (the default) observed and expected values are plotted (cf. 2.2) Three data sets RASCH par, RASCH logl and RASCH regr are created. The two first contain estimated item parameters and maximum log likelihood value. The data set RASCH regr is a version of the input data set containing extra variables that can be used to fit regression models with latent outcomes (a SAS macro, latreg, that can do this is available from This data set contains the score t = i x i and the estimated values of log(γ t ) for each possible score value t = 0, 1,..., T = I H. 2.2 Goodness of fit Specifying fittest=yes creates plots of observed and expected item category frequencies stratified by the total score: Let N t denote the number of persons with score t, and let N iht denote the number of these persons giving the answer h to item i. For for each combination (i, h) {1,..., I} {1,..., H} the macro plots t p(x ih = 1 t) and t N iht N t. These plots are closely related to plots of the Item Characteristic Curves (or Item Response Functions) θ p(x ih = 1 θ), because t is a sufficient statistic. 4

5 3 Comments Because the macro uses the contingency table of item responses no responses must be missing. It is assumed that all items use the same number of response categories. The macro fits a Poisson regression model to the observed counts of the contingency table, if the estimation procedure fails to converge a warning or error message is printed. References Agresti, A. (1993). Computing conditional maximum likelihood estimates for generalized Rasch models using simple loglinear models with diagonals parameters. Scandinavian Journal of Statistics, 20 (1), Fischer, G. H., & Molenaar, I. W. (1995). Rasch models - foundations, recent developments, and applications. Springer-Verlag. Kelderman, H. (1984). Loglinear Rasch model tests. Psychometrika, 49, Kelderman, H. (1992). Computing maximum likelihood estimates of loglinear models from marginal sums with special attention to loglinear item response theory. Psychometrika, 57, Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, McCullagh, P., & Nelder, J. A. (1989). Generalized linear models (2 ed.). Chapman and Hall. Nelder, J. A., & Wedderburn, R. W. M. (1972). Generalized linear models. Journal of the Royal Statistical Society A, 135, Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Naitional Institute for Educational Research (Expanded edition, Chicago: University of Chicago Press; 1980). Tjur, T. (1982). A connection between Rasch s item analysis model and a multiplicative poisson model. Scandinavian Journal of Statistics, 9 (1),

Polytomous Rasch models in SAS

Polytomous Rasch models in SAS July 4-5, 2011, Paris, ProQoL Agenda 1 The polytomous Rasch model Parameter estimation Graphics Agenda 1 The polytomous Rasch model Parameter estimation Graphics 2 Examples of existing software Agenda