Paper 241. Generating Item Responses Based on Multidimensional Item Response Theory. Jeffrey D. Kromrey Cynthia G. Parshall Walter M.

Paper 241 Generatng Item Responses Based on Multdmensonal Item Response Theory Jeffrey D. Kromrey Cyntha G. Parshall Walter M. Chason Qng Y Unversty of South Florda ABSTRACT The purpose of ths paper s to demonstrate code wrtten n SAS/IML software that generates examnees test responses (0/1s) based on a multdmensonal tem response theory (MIRT) model. Ths program reads n a fle of calbrated tem parameters from the NOHARM computer program (Fraser & McDonald, 1986) and generates normally dstrbuted random varables to represent examnees ablty levels on each dmenson. The SAS/IML program calculates the probablty of an examnee obtanng a correct response based on the MIRT model, then compares ths probablty wth a unform random number to decde the examnee s tem response. If the probablty s larger than the random number, the examnee s credted a correct response (.e., an tem score of 1), otherwse, a zero. The program allows control of the number of samples, the number of examnees, and the number of tems for whch tem responses are generated. INTRODUCTION Item response theory (IRT; Lord, 1952, 1953a, 1953b) apples a set of mathematcal models to ndcate the nteracton between an examnee s ablty (θ) or a composte of abltes and the characterstcs of tems n a test. In IRT models, θˆ s used to denote an examnee s estmated level of the latent trat, ablty, or skll that s measured by test tems. Many dfferent types of models have been developed n IRT (e.g., van der Lnden & Hambleton, 1996). In ths presentaton, however, attenton s focused on three-parameter models for dchotomously scored tems (.e., correct/not correct; 0/1). In IRT, as an examnee s ablty (θ) ncreases so does the probablty of answerng an tem correctly. The probablty of an examnee answerng an tem correctly n the three-parameter logstc IRT model can be defned as where 1 c P ( θ j ) = c + 1+ e Da ( θ j b ) e s the base of the natural logarthms and equals 2.71828K, ndexes test tem ( = 1,2,3, K, n ), j ndexes examnee, and j = 1,2,3, K,N, a b c θ j P ( θ j D ) s the tem dscrmnaton ndex for tem, that s proportonal to the slope of the tem response functon at the pont θ = b, s the tem dffculty ndex for tem, that s the pont on the ablty scale at whch an examnee has ( 1+ c) 2 probablty of answerng tem correctly, s the lower asymptote parameter of the tem response functon for tem, that represents the probablty of examnees wth very low ablty correctly answerng the tem, represents the ablty of examnee j, s the probablty of examnee j wth ablty level θ answerng tem correctly, and s a scalng factor that equals 1.702. IRT ncludes a group of assumptons about the data to whch the models apply (Hambleton, Swamnathan, & Rogers, 1991). One assumpton s called the assumpton of undmensonalty, whch means that only one ablty or one composte of multple abltes s measured by a test. However, many educatonal j

and psychologcal tests measure several latent trats rather than a sngle one (Reckase, Ackerman, & Carlson, 1988; Traub, 1983) and the extent to whch ndvdual tems reflect each trat can vary from tem to tem (Ackerman, 1994b). For example, a smple mathematcs story problem may requre both readng and mathematcs sklls to provde a correct response. Examnees may brng a varety of cogntve sklls to a testng stuaton, some of whch may be used durng the test and some not. Mller and Hrsch (1992) ndcated that substantal measurement problems may arse f a undmensonal tem analyss procedure s used wth multdmensonal tems. For example, problems can occur n the process of constructng a test usng classcal test theory procedures, or when the statstcs provde no ndcaton of what abltes are beng measured by tems or how well each ablty s measured. Therefore, researchers have been advsed to use MIRT when the undmensonalty assumpton s volated (Reckase, 1985; Ackerman, 1994a). The MIRT models do not requre the assumpton of undmensonalty. The probablty of a correct response to tem n an k-dmensonal logstc model (Reckase, 1985) can be expressed as where P( u = 1 θ j exp[1. 702a θ j + d ] ) = c + ( 1 c ) 1. 0 + exp[ 1. 702a θ + d ] u s examnee s score (0/1) on tem ( = 1, 2, 3, K, n ), a d c θ j s the vector of tem dscrmnaton parameters ( a k = a 1, a 2, a 3, K, a m ) for tem n k dmensons ( k = 1, 2,, 3, K, m ), s the scalar dffculty parameter for tem, negatve d values represent dffcult tems, and postve values represent easy tems, s the scalar lower asymptote parameter for tem, s the vector of θˆ for person j ( j = 1, 2, 3, K, N ), and j P ( u j = 1 θ ) s the probablty of an examnee j correctly answerng tem. In ths model, there s an tem dscrmnaton parameter for each dmenson of the model but only one overall tem dffculty parameter. The components n the functon are addtve, thus, beng low on one latent trat can be compensated for by beng hgh on another trat. NOHARM (Fraser & McDonald, 1986) and TESTFACT (Wlson, Wood, & Gbbons, 1984) are two of the computer programs that estmate parameters for the MIRT model. NOHARM (Fraser & McDonald, 1986) s a computer program that fts the normal ogve model by a least-squares procedure and wll estmate a k and d parameters n the MIRT model. NOHARM (Fraser & McDonald, 1986) does not estmate the c parameters but requres values to be nput and treated as fxed. Usually the c parameters are estmated from a undmensonal analyss usng a computer program such as BILOG (Mslevy & Bock, 1990). Mller and Hrsch (1992) ndcated that asymptotcally the c values are the same for models of any number of dmensons. The orgnal NOHARM (Fraser & McDonald, 1986) program can handle as many as sx dmensons of tem parameters n a multdmensonal space. Prevous research has ndcated that smulated data based on a MIRT model are more smlar to real test data than are data generated by other approaches (Davey, Nerng, & Thompson, 1997; Parshall, Kromrey, Chason, & Y, 1997). Recently, researchers have used MIRT as the bass of data smulatons (e.g., Parshall, Davey, & Nerng, 1998; Y & Nerng, 1998a, 1998b). The SAS code n ths presentaton demonstrates how to use SAS/IML to smulate data accordng to a MIRT model. GENMIRT.SAS The program GENMIRT.SAS uses SAS/IML to smulate examnees responses accordng to the MIRT model. The program requres, as nput, MIRT parameters for a set of test tems. These parameters may be obtaned from a MIRT calbraton program, such as NOHARM (Fraser & MacDonald, 1986). The output from the program s an ASCII fle of tem scores (0/1s), representng correct and ncorrect

responses to each test tem. The output fle s a seres of N X K matrces, n whch N s the number of examnees smulated, and K s the number of tems on the smulated test. The tem score matrces are augmented wth examnee dentfcaton numbers and the examnee ablty level ( θ) for each dmenson. The program GENMIRT.SAS operates n sx major steps, as follows: 1. Read MIRT tem parameters. As wrtten, the MIRT parameters are read from an external fle, nto a SAS data step, then passed to SAS/IML. 2. Establsh the number of samples and number of examnees to generate. The two nested do loops, (DO REP = 1 to 100, and DO I = 1 to 1000) establsh, respectvely, the number of samples to generate and the number of examnees n each sample for whom responses wll be smulated. Smply changng the maxmum values of REP and I n these two loops changes the number of samples or number of examnees to be smulated. 3. Smulate examnee ablty on each dmenson. Generate sx random numbers from an NID(0,1) dstrbuton. These values are used as the examnees true ablty levels on the sx MIRT dmensons. 4. Generate a unform random number for each examnee and for each tem on the test. To smulate the probablstc nature of test tem responses, a unform random number (U), on the 0 to 1 nterval, s compared to the calculated probablty of a correct response for each tem (P ). If P > U then the examnee s credted wth a correct response to the tem (recevng an tem score of 1). Conversely, f P <=U then the examnee obtans an ncorrect response to the tem (recevng an tem score of 0). 5. Calculate a vector of tem scores for each examnee. The subroutne IRTSCORE s used to calculate each examnees probablty of correct response to each tem. The nputs to ths subroutne are the number of test tems for the smulaton (the scalar quantty NITEMS), the 1 X 6 vector of examnee ablty parameters (SIMULEES), an examnee dentfcaton number (IDN2), the 1 X NITEMS vector of unform random numbers that are compared to the probabltes of correct responses for the set of tems (RRV), and the vectors of MIRT tem parameters (POPA, POPB, and POPC). For each tem, the probablty of a correct response s calculated usng the PROBNORM functon, and the probablty s compared to the value of the unform random number. The subroutne returns a vector of 1s and 0s that represent the examnee responses to the set of tems (SCORE). 6. Create the output fle. The elements of the vectors SIMULEES and SCORE are placed nto scalars so that the FILE and PUT statements wll wrte them to an ASCII fle. PROGRAM CODE optons ls=182 ps=32767 pageno=1 formdlm= - ; proc prntto prnt= c:\500a.raw ; * +-----------------------------------------------------------+ GENMIRT.SAS Generate a fle of tem scores (0,1) based on sx-factor MIRT model. data params; * +-----------------------------------------------------------+ Ths s a fle of known tem parameters, separated by at least one blank and ncludng an tem number on each record. nfle a:\in80.prs lrecl=124 mssover; nput temnum a1 a2 a3 a4 a5 a6 b c; proc ml; Defne the subroutne to analyze each examnee response vector. start rtscore (ntems, smulees, dn2, rrv, popa, popb, popc, score); factnorm=probnorm(popb+(popa*smulees)); *+-----------------------------------------------------------+ The followng yelds a vector of probabltes of correct responses on each tem (p). P = (popc + ((1 - popc) # factnorm)) ;

*+-----------------------------------------------------------+ The followng yelds the score vector (1,0) score = P>rrv; fnsh; use params; * +-----------------------------------------------------------+ Readng n the vectors of tem parameters read all var {a1 a2 a3 a4 a5 a6} nto popa; read all var {b} nto popb; read all var {c} nto popc; ntms=nrow(popa); Ths loop generates sx theta values for each examnee and a set of NITMS random numbers. DO REP = 1 TO 100; DO I = 1 to 1000; seed1=round(100000000*ranun(0)); dn2 = ; Generaton of theta values from N(0,1) dstrbuton sm1 = rannor(seed1); sm2 = rannor(seed1); sm3 = rannor(seed1); sm4 = rannor(seed1); sm5 = rannor(seed1); sm6 = rannor(seed1); smulees = sm1//sm2//sm3//sm4//sm5//sm6; Generaton of unform random numbers for each person and each test tem. These are used to determne tem response correctness. rrv = J(1,ntms,0); do k = 1 to ntms; rrv[1,k] = RANUNI(seed1); end; * +--------------------------------------+ Call the scorng subroutne +---------------------------------------+; run rtscore (ntms, smulees, dn2, rrv, popa, popb, popc, score); * +-----------------------------------------------------+ Create varables for the output data fle +------------------------------------------------------+; dnum = dn2[1,1]; thet1 = smulees[1,1]; thet2 = smulees[2,1]; thet3 = smulees[3,1]; thet4 = smulees[4,1]; thet5 = smulees[5,1]; thet6 = smulees[6,1]; tm1 = score[1,1]; tm2 = score[1,2]; tm3 = score[1,3]; tm4 = score[1,4]; tm5 = score[1,5]; [ etc. for each tem] tm79 = score[1,79]; tm80 = score[1,80]; fle prnt ; put @1 dnum 4. @6 thet1 12.8 @20 tm1 1. @21 tm2 1. @22 tm3 1. @23 tm4 1. [ etc. for each tem] @98 tm79 1. @99 tm80 1. @110 thet2 12.8 @125 thet3 12.8 @140 thet4 12.8 @155 thet5 12.8 @170 thet6 12.8; end; end; qut; CONCLUSION GENMIRT.SAS provdes a smple vehcle for the smulaton of realstc examnee test tem responses. The data smulated by ths program may be used for research on a varety of ssues related to psychometrcs, such as the accuracy and precson of methods to estmate examnee ablty, strateges for test equatng, phenomena

assocated wth computer adaptve testng algorthms, and technques to detect dfferental tem functonng. REFERENCES Ackerman, T. A. (1994a). Usng multdmensonal tem response theory to understand what tems and tests are measurng. Appled Measurement n Educaton, 7(4), 255-278. Ackerman, T. A. (1984b). Creatng a test nformaton profle for a two-dmensonal latent space. Appled Psychologcal Measurement, 18(3), 257-275. Davey, T., Nerng, M., & Thompson, T. (1997, March). Realstc smulaton of tem response data. Paper presented at the annual meetng of the Natonal Councl on Measurement n Educaton, Chcago, IL. Fraser, C. & McDonald, R. (1986). NOHARM II: A FORTRAN program for fttng undmensonal and multdmensonal normal ogve models of latent trat theory. Amdale, Australa: Unversty of New England, Center for Behavoral Studes. Hambleton, R. K., Swamnathan, H., & Rogers, H. J. (1991). Fundamental of tem response theory. Sage Publcatons. Lord, F. M. (1952). A theory of test scores. Psychometrc Monograph, 7. Lord, F. M. (1953a). An applcaton of confdence ntervals and maxmum lkelhood to the estmaton of an examnee s ablty. Psychometrka, 18, 57-75. Lord, F. M. (1953b). The relaton of test score to the trat underlyng the test. Educatonal and Psychologcal Measurement, 13, 517-548. Mller, T. R. & Hrsch, T. M. (1992). Cluster analyss of angular data n applcatons of multdmensonal tem-response theory. Appled Measurement n Educaton, 5(3), 193-211. Mslevy, R. J. & Bock, R. D. (1990). BILOG3: Item analyss and test scorng wth bnary logstc models. [Computer program]. Chcago, IN: Scentfc Software. Parshall, C. G., & Davey, T., & Nerng, M. (1998, Aprl). Test development exposure control for adaptve testng. In T. Mller (char), Adaptve Testng Research at ACT. Symposum conducted at the annual meetng of the Natonal Councl on Measurement n Educaton, San Dego, CA. Parshall, C. G., Kromrey, J. D., Chason, W. M., & Y, Q. (1997, June). Evaluaton of parameter estmaton under modfed IRT models and small samples. Paper presented at the annual meetng of the Psychometrc Socety, Gatlnburg, TN. Reckase, M. D. (1985, Aprl). The dffculty of test tems that measure more than one ablty. Paper presented at the annual meetng of the Amercan Educatonal Research Assocaton, Chcago, IL. Reckase, M. D., Ackerman, T. A., & Carlson, J. E. (1988). Buldng a undmensonal test usng multdmensonal tems. Journal of Educatonal Measurement, 25(3), 193-203. Traub, R. E. (1983). A pror consderatons n choosng an tem response model. In R. K. Hambleton (Ed.), Applcatons of tem response theory (pp. 57-70). Vancouver, BC: Educatonal Research Insttute of Brtsh Columba. van der Lnden, W. J. & Hambleton, R. K. (1996, Eds.). Handbook of modern tem response theory. New York, NY: Spnger-Verlag. Wlson, D., Wood, R., & Gbbons, R. (1984). TESTFACT: Test scorng and fullnformaton tem factor analyss. [Computer program]. Mooresvlle, IN: Scentfc Software, Inc. Y, Q. & Nerng, M. (1998a, Aprl). Nonmodel-fttng responses and robust ablty estmaton n a realstc CAT envronment. Paper presented at the annual meetng of Amercan Educatonal Research Assocaton, San Dego, CA. Y, Q. & Nerng, M. (1998b, Aprl). The mpact of nonmodel-fttng responses n a realstc CAT envronment. In M. Nerng (char), Innovatons n person-ft research. Symposum conducted at the annual meetng of the Natonal Councl on Measurement n Educaton, San Dego, CA. SAS/IML s a regstered trademark of SAS Insttute Inc. n the USA and other countres. ndcates USA regstraton.

CONTACT INFORMATION The authors can be contacted at the Unversty of South Florda, Department of Educatonal Measurement and Research, FAO 100U, 4202 East Fowler Ave., Tampa, FL 33620, by telephone (813) 974-3220, or Jeff can be contacted by e- mal: kromrey@typhoon.coedu.usf.edu