Paper 241. Generating Item Responses Based on Multidimensional Item Response Theory. Jeffrey D. Kromrey Cynthia G. Parshall Walter M.

Similar documents
Using SAS/OR for Automated Test Assembly from IRT-Based Item Banks

A Coding Practice for Preparing Adaptive Multistage Testing Yung-chen Hsu, GED Testing Service, LLC, Washington, DC

The necessity and importance of cognitive diagnosis is being realized by more and

X- Chart Using ANOM Approach

Estimation of composite score classification accuracy using compound probability distributions

Biostatistics 615/815

Programming in Fortran 90 : 2017/2018

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Adaptive Regression in SAS/IML

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

S1 Note. Basis functions.

Air Transport Demand. Ta-Hui Yang Associate Professor Department of Logistics Management National Kaohsiung First Univ. of Sci. & Tech.

CS 534: Computer Vision Model Fitting

Multi-Group Confirmatory Factor Analysis for Testing Measurement Invariance in Mixed Item Format Data

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

y and the total sum of

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

A New Token Allocation Algorithm for TCP Traffic in Diffserv Network

Performance Evaluation of Information Retrieval Systems

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Support Vector Machines

Multilevel Analysis with Informative Weights

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Lobachevsky State University of Nizhni Novgorod. Polyhedron. Quick Start Guide

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

The Research of Support Vector Machine in Agricultural Data Classification

Analysis on the Workspace of Six-degrees-of-freedom Industrial Robot Based on AutoCAD

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

Parallelism for Nested Loops with Non-uniform and Flow Dependences

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

Wishing you all a Total Quality New Year!

Range images. Range image registration. Examples of sampling patterns. Range images and range surfaces

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Cluster Analysis of Electrical Behavior

Handbook of Polytomous Item Response Theory Models. Michael L. Nering, Remo Ostini

Machine Learning. Topic 6: Clustering

EXTENDED BIC CRITERION FOR MODEL SELECTION

Simulation Based Analysis of FAST TCP using OMNET++

A Bootstrap Approach to Robust Regression

Fusion Performance Model for Distributed Tracking and Classification

Meta-heuristics for Multidimensional Knapsack Problems

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

A Facet Generation Procedure. for solving 0/1 integer programs

Anonymisation of Public Use Data Sets

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

A DATA ANALYSIS CODE FOR MCNP MESH AND STANDARD TALLIES

A Post Randomization Framework for Privacy-Preserving Bayesian. Network Parameter Learning

Description of NTU Approach to NTCIR3 Multilingual Information Retrieval

Loop Permutation. Loop Transformations for Parallelism & Locality. Legality of Loop Interchange. Loop Interchange (cont)

A New Approach For the Ranking of Fuzzy Sets With Different Heights

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Finite Element Analysis of Rubber Sealing Ring Resilience Behavior Qu Jia 1,a, Chen Geng 1,b and Yang Yuwei 2,c

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Lecture 5: Probability Distributions. Random Variables

A Semi-parametric Regression Model to Estimate Variability of NO 2

Unsupervised Learning

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

Help for Time-Resolved Analysis TRI2 version 2.4 P Barber,

Research on Categorization of Animation Effect Based on Data Mining

Unsupervised Learning and Clustering

Adjustment methods for differential measurement errors in multimode surveys

Outlier Detection based on Robust Parameter Estimates

Review of approximation techniques

APPLICATION OF A COMPUTATIONALLY EFFICIENT GEOSTATISTICAL APPROACH TO CHARACTERIZING VARIABLY SPACED WATER-TABLE DATA

USING GRAPHING SKILLS

Solving two-person zero-sum game by Matlab

(1) The control processes are too complex to analyze by conventional quantitative techniques.

A Similarity-Based Prognostics Approach for Remaining Useful Life Estimation of Engineered Systems

EXST7034 Regression Techniques Geaghan Logistic regression Diagnostics Page 1

The Comparison of Calibration Method of Binocular Stereo Vision System Ke Zhang a *, Zhao Gao b

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

Skew Angle Estimation and Correction of Hand Written, Textual and Large areas of Non-Textual Document Images: A Novel Approach

USING LINEAR REGRESSION FOR THE AUTOMATION OF SUPERVISED CLASSIFICATION IN MULTITEMPORAL IMAGES

Intra-Parametric Analysis of a Fuzzy MOLP

Lecture 4: Principal components

Harvard University CS 101 Fall 2005, Shimon Schocken. Assembler. Elements of Computing Systems 1 Assembler (Ch. 6)

An Entropy-Based Approach to Integrated Information Needs Assessment

Some variations on the standard theoretical models for the h-index: A comparative analysis. C. Malesios 1

Guidelines for Developing Effective Slide Presentations

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

A Statistical Model Selection Strategy Applied to Neural Networks

Multidimensional Item Response Theory (MIRT) University of Kansas Item Response Theory Stats Camp 07

Multisensor Data Fusion for Prosthetic Control

Optimizing Document Scoring for Query Retrieval

Edge Detection in Noisy Images Using the Support Vector Machines

Classifier Selection Based on Data Complexity Measures *

Virtual Machine Migration based on Trust Measurement of Computer Node

Evolutionary Wavelet Neural Network for Large Scale Function Estimation in Optimization

Feature Reduction and Selection

A mathematical programming approach to the analysis, design and scheduling of offshore oilfields

Keywords - Wep page classification; bag of words model; topic model; hierarchical classification; Support Vector Machines

Transcription:

Paper 241 Generatng Item Responses Based on Multdmensonal Item Response Theory Jeffrey D. Kromrey Cyntha G. Parshall Walter M. Chason Qng Y Unversty of South Florda ABSTRACT The purpose of ths paper s to demonstrate code wrtten n SAS/IML software that generates examnees test responses (0/1s) based on a multdmensonal tem response theory (MIRT) model. Ths program reads n a fle of calbrated tem parameters from the NOHARM computer program (Fraser & McDonald, 1986) and generates normally dstrbuted random varables to represent examnees ablty levels on each dmenson. The SAS/IML program calculates the probablty of an examnee obtanng a correct response based on the MIRT model, then compares ths probablty wth a unform random number to decde the examnee s tem response. If the probablty s larger than the random number, the examnee s credted a correct response (.e., an tem score of 1), otherwse, a zero. The program allows control of the number of samples, the number of examnees, and the number of tems for whch tem responses are generated. INTRODUCTION Item response theory (IRT; Lord, 1952, 1953a, 1953b) apples a set of mathematcal models to ndcate the nteracton between an examnee s ablty (θ) or a composte of abltes and the characterstcs of tems n a test. In IRT models, θˆ s used to denote an examnee s estmated level of the latent trat, ablty, or skll that s measured by test tems. Many dfferent types of models have been developed n IRT (e.g., van der Lnden & Hambleton, 1996). In ths presentaton, however, attenton s focused on three-parameter models for dchotomously scored tems (.e., correct/not correct; 0/1). In IRT, as an examnee s ablty (θ) ncreases so does the probablty of answerng an tem correctly. The probablty of an examnee answerng an tem correctly n the three-parameter logstc IRT model can be defned as where 1 c P ( θ j ) = c + 1+ e Da ( θ j b ) e s the base of the natural logarthms and equals 2.71828K, ndexes test tem ( = 1,2,3, K, n ), j ndexes examnee, and j = 1,2,3, K,N, a b c θ j P ( θ j D ) s the tem dscrmnaton ndex for tem, that s proportonal to the slope of the tem response functon at the pont θ = b, s the tem dffculty ndex for tem, that s the pont on the ablty scale at whch an examnee has ( 1+ c) 2 probablty of answerng tem correctly, s the lower asymptote parameter of the tem response functon for tem, that represents the probablty of examnees wth very low ablty correctly answerng the tem, represents the ablty of examnee j, s the probablty of examnee j wth ablty level θ answerng tem correctly, and s a scalng factor that equals 1.702. IRT ncludes a group of assumptons about the data to whch the models apply (Hambleton, Swamnathan, & Rogers, 1991). One assumpton s called the assumpton of undmensonalty, whch means that only one ablty or one composte of multple abltes s measured by a test. However, many educatonal j

and psychologcal tests measure several latent trats rather than a sngle one (Reckase, Ackerman, & Carlson, 1988; Traub, 1983) and the extent to whch ndvdual tems reflect each trat can vary from tem to tem (Ackerman, 1994b). For example, a smple mathematcs story problem may requre both readng and mathematcs sklls to provde a correct response. Examnees may brng a varety of cogntve sklls to a testng stuaton, some of whch may be used durng the test and some not. Mller and Hrsch (1992) ndcated that substantal measurement problems may arse f a undmensonal tem analyss procedure s used wth multdmensonal tems. For example, problems can occur n the process of constructng a test usng classcal test theory procedures, or when the statstcs provde no ndcaton of what abltes are beng measured by tems or how well each ablty s measured. Therefore, researchers have been advsed to use MIRT when the undmensonalty assumpton s volated (Reckase, 1985; Ackerman, 1994a). The MIRT models do not requre the assumpton of undmensonalty. The probablty of a correct response to tem n an k-dmensonal logstc model (Reckase, 1985) can be expressed as where P( u = 1 θ j exp[1. 702a θ j + d ] ) = c + ( 1 c ) 1. 0 + exp[ 1. 702a θ + d ] u s examnee s score (0/1) on tem ( = 1, 2, 3, K, n ), a d c θ j s the vector of tem dscrmnaton parameters ( a k = a 1, a 2, a 3, K, a m ) for tem n k dmensons ( k = 1, 2,, 3, K, m ), s the scalar dffculty parameter for tem, negatve d values represent dffcult tems, and postve values represent easy tems, s the scalar lower asymptote parameter for tem, s the vector of θˆ for person j ( j = 1, 2, 3, K, N ), and j P ( u j = 1 θ ) s the probablty of an examnee j correctly answerng tem. In ths model, there s an tem dscrmnaton parameter for each dmenson of the model but only one overall tem dffculty parameter. The components n the functon are addtve, thus, beng low on one latent trat can be compensated for by beng hgh on another trat. NOHARM (Fraser & McDonald, 1986) and TESTFACT (Wlson, Wood, & Gbbons, 1984) are two of the computer programs that estmate parameters for the MIRT model. NOHARM (Fraser & McDonald, 1986) s a computer program that fts the normal ogve model by a least-squares procedure and wll estmate a k and d parameters n the MIRT model. NOHARM (Fraser & McDonald, 1986) does not estmate the c parameters but requres values to be nput and treated as fxed. Usually the c parameters are estmated from a undmensonal analyss usng a computer program such as BILOG (Mslevy & Bock, 1990). Mller and Hrsch (1992) ndcated that asymptotcally the c values are the same for models of any number of dmensons. The orgnal NOHARM (Fraser & McDonald, 1986) program can handle as many as sx dmensons of tem parameters n a multdmensonal space. Prevous research has ndcated that smulated data based on a MIRT model are more smlar to real test data than are data generated by other approaches (Davey, Nerng, & Thompson, 1997; Parshall, Kromrey, Chason, & Y, 1997). Recently, researchers have used MIRT as the bass of data smulatons (e.g., Parshall, Davey, & Nerng, 1998; Y & Nerng, 1998a, 1998b). The SAS code n ths presentaton demonstrates how to use SAS/IML to smulate data accordng to a MIRT model. GENMIRT.SAS The program GENMIRT.SAS uses SAS/IML to smulate examnees responses accordng to the MIRT model. The program requres, as nput, MIRT parameters for a set of test tems. These parameters may be obtaned from a MIRT calbraton program, such as NOHARM (Fraser & MacDonald, 1986). The output from the program s an ASCII fle of tem scores (0/1s), representng correct and ncorrect

responses to each test tem. The output fle s a seres of N X K matrces, n whch N s the number of examnees smulated, and K s the number of tems on the smulated test. The tem score matrces are augmented wth examnee dentfcaton numbers and the examnee ablty level ( θ) for each dmenson. The program GENMIRT.SAS operates n sx major steps, as follows: 1. Read MIRT tem parameters. As wrtten, the MIRT parameters are read from an external fle, nto a SAS data step, then passed to SAS/IML. 2. Establsh the number of samples and number of examnees to generate. The two nested do loops, (DO REP = 1 to 100, and DO I = 1 to 1000) establsh, respectvely, the number of samples to generate and the number of examnees n each sample for whom responses wll be smulated. Smply changng the maxmum values of REP and I n these two loops changes the number of samples or number of examnees to be smulated. 3. Smulate examnee ablty on each dmenson. Generate sx random numbers from an NID(0,1) dstrbuton. These values are used as the examnees true ablty levels on the sx MIRT dmensons. 4. Generate a unform random number for each examnee and for each tem on the test. To smulate the probablstc nature of test tem responses, a unform random number (U), on the 0 to 1 nterval, s compared to the calculated probablty of a correct response for each tem (P ). If P > U then the examnee s credted wth a correct response to the tem (recevng an tem score of 1). Conversely, f P <=U then the examnee obtans an ncorrect response to the tem (recevng an tem score of 0). 5. Calculate a vector of tem scores for each examnee. The subroutne IRTSCORE s used to calculate each examnees probablty of correct response to each tem. The nputs to ths subroutne are the number of test tems for the smulaton (the scalar quantty NITEMS), the 1 X 6 vector of examnee ablty parameters (SIMULEES), an examnee dentfcaton number (IDN2), the 1 X NITEMS vector of unform random numbers that are compared to the probabltes of correct responses for the set of tems (RRV), and the vectors of MIRT tem parameters (POPA, POPB, and POPC). For each tem, the probablty of a correct response s calculated usng the PROBNORM functon, and the probablty s compared to the value of the unform random number. The subroutne returns a vector of 1s and 0s that represent the examnee responses to the set of tems (SCORE). 6. Create the output fle. The elements of the vectors SIMULEES and SCORE are placed nto scalars so that the FILE and PUT statements wll wrte them to an ASCII fle. PROGRAM CODE optons ls=182 ps=32767 pageno=1 formdlm= - ; proc prntto prnt= c:\500a.raw ; * +-----------------------------------------------------------+ GENMIRT.SAS Generate a fle of tem scores (0,1) based on sx-factor MIRT model. data params; * +-----------------------------------------------------------+ Ths s a fle of known tem parameters, separated by at least one blank and ncludng an tem number on each record. nfle a:\in80.prs lrecl=124 mssover; nput temnum a1 a2 a3 a4 a5 a6 b c; proc ml; Defne the subroutne to analyze each examnee response vector. start rtscore (ntems, smulees, dn2, rrv, popa, popb, popc, score); factnorm=probnorm(popb+(popa*smulees)); *+-----------------------------------------------------------+ The followng yelds a vector of probabltes of correct responses on each tem (p). P = (popc + ((1 - popc) # factnorm)) ;

*+-----------------------------------------------------------+ The followng yelds the score vector (1,0) score = P>rrv; fnsh; use params; * +-----------------------------------------------------------+ Readng n the vectors of tem parameters read all var {a1 a2 a3 a4 a5 a6} nto popa; read all var {b} nto popb; read all var {c} nto popc; ntms=nrow(popa); Ths loop generates sx theta values for each examnee and a set of NITMS random numbers. DO REP = 1 TO 100; DO I = 1 to 1000; seed1=round(100000000*ranun(0)); dn2 = ; Generaton of theta values from N(0,1) dstrbuton sm1 = rannor(seed1); sm2 = rannor(seed1); sm3 = rannor(seed1); sm4 = rannor(seed1); sm5 = rannor(seed1); sm6 = rannor(seed1); smulees = sm1//sm2//sm3//sm4//sm5//sm6; Generaton of unform random numbers for each person and each test tem. These are used to determne tem response correctness. rrv = J(1,ntms,0); do k = 1 to ntms; rrv[1,k] = RANUNI(seed1); end; * +--------------------------------------+ Call the scorng subroutne +---------------------------------------+; run rtscore (ntms, smulees, dn2, rrv, popa, popb, popc, score); * +-----------------------------------------------------+ Create varables for the output data fle +------------------------------------------------------+; dnum = dn2[1,1]; thet1 = smulees[1,1]; thet2 = smulees[2,1]; thet3 = smulees[3,1]; thet4 = smulees[4,1]; thet5 = smulees[5,1]; thet6 = smulees[6,1]; tm1 = score[1,1]; tm2 = score[1,2]; tm3 = score[1,3]; tm4 = score[1,4]; tm5 = score[1,5]; [ etc. for each tem] tm79 = score[1,79]; tm80 = score[1,80]; fle prnt ; put @1 dnum 4. @6 thet1 12.8 @20 tm1 1. @21 tm2 1. @22 tm3 1. @23 tm4 1. [ etc. for each tem] @98 tm79 1. @99 tm80 1. @110 thet2 12.8 @125 thet3 12.8 @140 thet4 12.8 @155 thet5 12.8 @170 thet6 12.8; end; end; qut; CONCLUSION GENMIRT.SAS provdes a smple vehcle for the smulaton of realstc examnee test tem responses. The data smulated by ths program may be used for research on a varety of ssues related to psychometrcs, such as the accuracy and precson of methods to estmate examnee ablty, strateges for test equatng, phenomena

assocated wth computer adaptve testng algorthms, and technques to detect dfferental tem functonng. REFERENCES Ackerman, T. A. (1994a). Usng multdmensonal tem response theory to understand what tems and tests are measurng. Appled Measurement n Educaton, 7(4), 255-278. Ackerman, T. A. (1984b). Creatng a test nformaton profle for a two-dmensonal latent space. Appled Psychologcal Measurement, 18(3), 257-275. Davey, T., Nerng, M., & Thompson, T. (1997, March). Realstc smulaton of tem response data. Paper presented at the annual meetng of the Natonal Councl on Measurement n Educaton, Chcago, IL. Fraser, C. & McDonald, R. (1986). NOHARM II: A FORTRAN program for fttng undmensonal and multdmensonal normal ogve models of latent trat theory. Amdale, Australa: Unversty of New England, Center for Behavoral Studes. Hambleton, R. K., Swamnathan, H., & Rogers, H. J. (1991). Fundamental of tem response theory. Sage Publcatons. Lord, F. M. (1952). A theory of test scores. Psychometrc Monograph, 7. Lord, F. M. (1953a). An applcaton of confdence ntervals and maxmum lkelhood to the estmaton of an examnee s ablty. Psychometrka, 18, 57-75. Lord, F. M. (1953b). The relaton of test score to the trat underlyng the test. Educatonal and Psychologcal Measurement, 13, 517-548. Mller, T. R. & Hrsch, T. M. (1992). Cluster analyss of angular data n applcatons of multdmensonal tem-response theory. Appled Measurement n Educaton, 5(3), 193-211. Mslevy, R. J. & Bock, R. D. (1990). BILOG3: Item analyss and test scorng wth bnary logstc models. [Computer program]. Chcago, IN: Scentfc Software. Parshall, C. G., & Davey, T., & Nerng, M. (1998, Aprl). Test development exposure control for adaptve testng. In T. Mller (char), Adaptve Testng Research at ACT. Symposum conducted at the annual meetng of the Natonal Councl on Measurement n Educaton, San Dego, CA. Parshall, C. G., Kromrey, J. D., Chason, W. M., & Y, Q. (1997, June). Evaluaton of parameter estmaton under modfed IRT models and small samples. Paper presented at the annual meetng of the Psychometrc Socety, Gatlnburg, TN. Reckase, M. D. (1985, Aprl). The dffculty of test tems that measure more than one ablty. Paper presented at the annual meetng of the Amercan Educatonal Research Assocaton, Chcago, IL. Reckase, M. D., Ackerman, T. A., & Carlson, J. E. (1988). Buldng a undmensonal test usng multdmensonal tems. Journal of Educatonal Measurement, 25(3), 193-203. Traub, R. E. (1983). A pror consderatons n choosng an tem response model. In R. K. Hambleton (Ed.), Applcatons of tem response theory (pp. 57-70). Vancouver, BC: Educatonal Research Insttute of Brtsh Columba. van der Lnden, W. J. & Hambleton, R. K. (1996, Eds.). Handbook of modern tem response theory. New York, NY: Spnger-Verlag. Wlson, D., Wood, R., & Gbbons, R. (1984). TESTFACT: Test scorng and fullnformaton tem factor analyss. [Computer program]. Mooresvlle, IN: Scentfc Software, Inc. Y, Q. & Nerng, M. (1998a, Aprl). Nonmodel-fttng responses and robust ablty estmaton n a realstc CAT envronment. Paper presented at the annual meetng of Amercan Educatonal Research Assocaton, San Dego, CA. Y, Q. & Nerng, M. (1998b, Aprl). The mpact of nonmodel-fttng responses n a realstc CAT envronment. In M. Nerng (char), Innovatons n person-ft research. Symposum conducted at the annual meetng of the Natonal Councl on Measurement n Educaton, San Dego, CA. SAS/IML s a regstered trademark of SAS Insttute Inc. n the USA and other countres. ndcates USA regstraton.

CONTACT INFORMATION The authors can be contacted at the Unversty of South Florda, Department of Educatonal Measurement and Research, FAO 100U, 4202 East Fowler Ave., Tampa, FL 33620, by telephone (813) 974-3220, or Jeff can be contacted by e- mal: kromrey@typhoon.coedu.usf.edu