Compensatory and Noncompensatory Multidimensional Randomized Item Response Analysis of the CAPS and EAQ Data

Similar documents
AA BB CC DD EE. Introduction to Graphics in R

MCMC Diagnostics. Yingbo Li MATH Clemson University. Yingbo Li (Clemson) MCMC Diagnostics MATH / 24

Linear Modeling with Bayesian Statistics

Markov Chain Monte Carlo (part 1)

The supclust Package

Logistic Regression. (Dichotomous predicted variable) Tim Frasier

ST440/540: Applied Bayesian Analysis. (5) Multi-parameter models - Initial values and convergence diagn

Issues in MCMC use for Bayesian model fitting. Practical Considerations for WinBUGS Users

Replication Note for A Bayesian Poisson Vector Autoregression Model

Multiple Imputation for Multilevel Models with Missing Data Using Stat-JR

9.1 Random coefficients models Constructed data Consumer preference mapping of carrots... 10

Package bmeta. R topics documented: January 8, Type Package

Package binomlogit. February 19, 2015

Generalized least squares (GLS) estimates of the level-2 coefficients,

1 Methods for Posterior Simulation

Mixture Analysis of the Galaxy Data Using the Package mixak

R package mcll for Monte Carlo local likelihood estimation

Package bayescl. April 14, 2017

MCMC Methods for data modeling

Package nonmem2r. April 5, 2018

Case Study IV: Bayesian clustering of Alzheimer patients

Extremely short introduction to R Jean-Yves Sgro Feb 20, 2018

Package hbm. February 20, 2015

Estimation of Item Response Models

CHAPTER 18 OUTPUT, SAVEDATA, AND PLOT COMMANDS

Package dalmatian. January 29, 2018

Also, for all analyses, two other files are produced upon program completion.

ANNOUNCING THE RELEASE OF LISREL VERSION BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3

Package gibbs.met documentation

Using the package glmbfp: a binary regression example.

A Basic Example of ANOVA in JAGS Joel S Steele

Package SmoothHazard

Package gibbs.met. February 19, 2015

MCMC for Bayesian Inference Metropolis: Solutions

CSSS 510: Lab 2. Introduction to Maximum Likelihood Estimation

PRINCIPLES OF PHYLOGENETICS Spring 2008 Updated by Nick Matzke. Lab 11: MrBayes Lab

Journal of Statistical Software

Demo 09-1a: One Sample (Single Group) Models

Multidimensional Item Response Theory (MIRT) University of Kansas Item Response Theory Stats Camp 07

STAT 311 (3 CREDITS) VARIANCE AND REGRESSION ANALYSIS ELECTIVE: ALL STUDENTS. CONTENT Introduction to Computer application of variance and regression

Package clusternomics

Show how the LG-Syntax can be generated from a GUI model. Modify the LG-Equations to specify a different LC regression model

Ludwig Fahrmeir Gerhard Tute. Statistical odelling Based on Generalized Linear Model. íecond Edition. . Springer

DPP: Reference documentation. version Luis M. Avila, Mike R. May and Jeffrey Ross-Ibarra 17th November 2017

Missing Data. SPIDA 2012 Part 6 Mixed Models with R:

Package sparsereg. R topics documented: March 10, Type Package

Time Series Analysis by State Space Methods

Bayesian Multiple QTL Mapping

Practice for Learning R and Learning Latex

Last updated January 4, 2012

Blimp User s Guide. Version 1.0. Brian T. Keller. Craig K. Enders.

CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA

Running WinBUGS from within R

Applied Regression Modeling: A Business Approach

NCSS Statistical Software

Package beast. March 16, 2018

Package atmcmc. February 19, 2015

Package mcemglm. November 29, 2015

Package qrfactor. February 20, 2015

The Amelia Package. March 25, 2007

The linear mixed model: modeling hierarchical and longitudinal data

Semiparametric Mixed Effecs with Hierarchical DP Mixture

Package glmmml. R topics documented: March 25, Encoding UTF-8 Version Date Title Generalized Linear Models with Clustering

Graphics - Part III: Basic Graphics Continued

LEA: An R Package for Landscape and Ecological Association Studies

The glmmml Package. August 20, 2006

Motivating Example. Missing Data Theory. An Introduction to Multiple Imputation and its Application. Background

A GENERAL GIBBS SAMPLING ALGORITHM FOR ANALYZING LINEAR MODELS USING THE SAS SYSTEM

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

Opening Windows into the Black Box

Expectation-Maximization Methods in Population Analysis. Robert J. Bauer, Ph.D. ICON plc.

jackstraw: Statistical Inference using Latent Variables

Univariate Extreme Value Analysis. 1 Block Maxima. Practice problems using the extremes ( 2.0 5) package. 1. Pearson Type III distribution

PSY 9556B (Feb 5) Latent Growth Modeling

Supplementary Notes on Multiple Imputation. Stephen du Toit and Gerhard Mels Scientific Software International

Package LEA. December 24, 2017

Expectation Maximization (EM) and Gaussian Mixture Models

Minitab 17 commands Prepared by Jeffrey S. Simonoff

IQR = number. summary: largest. = 2. Upper half: Q3 =

Package StVAR. February 11, 2017

Zero-Inflated Poisson Regression

Documentation for BayesAss 1.3

CHAPTER 11 EXAMPLES: MISSING DATA MODELING AND BAYESIAN ANALYSIS

Package Funclustering

Estimating Variance Components in MMAP

HomeWork 4 Hints {Your Name} deadline

Chapter 15 Mixed Models. Chapter Table of Contents. Introduction Split Plot Experiment Clustered Data References...

Package ConvergenceClubs

Package mfa. R topics documented: July 11, 2018

The GSM Package. August 9, 2007

OVERVIEW OF ESTIMATION FRAMEWORKS AND ESTIMATORS

Robust Linear Regression (Passing- Bablok Median-Slope)

Multidimensional Latent Regression

Statistical Programming with R

Package Bergm. R topics documented: September 25, Type Package

xxm Reference Manual

Package BKPC. March 13, 2018

Introduction to WinBUGS

GRAD6/8104; INES 8090 Spatial Statistic Spring 2017

SNAscript. friend.mat <- as.matrix(read.table("elfriend.dat",header=false,sep=" "))

Transcription:

Compensatory and Noncompensatory Multidimensional Randomized Item Response Analysis of the CAPS and EAQ Data Fox, Klein Entink, Avetisyan BJMSP (March, 2013). This document provides a step by step analysis of the CAPS and AEQ data using the Compensatory and Noncompensatory Multidimensional Randomized Item Response Models. The summarized output and description of the model is in Fox et al (2013, BJMSP). Here, a more detailed description is given of the function calls and the output. This document is not meant to serve as program manual, since it only provides information about the specific model analysis as described in the paper. REAL Data Study: The R-work directory with stored data objects is called DataObjects.Rdata load("dataobjects.rdata") #load data set The response matrix Y is defined to be N (793) times K (17) where the response options are integers from 1-5. The data object X contains a column with all ones (intercept), an indicator variable that equals zero when subject responded using the randomizing device (RRT = 0) and one when responded directly. The third column of X is an indicator that equals one when subject is female and zero when male. When there are no explanatory variables, an intercept still needs to be defined through a column of ones. An RR object is made that specifies which response is answered via the randomized response procedure (RR=1) and which one via direct questioning (RR=0). RR <- matrix(0,ncol=k, nrow=n) RR[which(X[,2]==0),1:K] <- 1 The program is loaded via ## load MIRT-RR script source("mirt-rr.real.r") The two-factor (Q=2) model is estimated through the following call making XG=50,000 iterations: outq2 <- MIRTRR(Y,X,Q=2,XG=50000,RR) Object outq2 will contain the sampled values from the posterior distributions. We can summarize results by computing means and variances. To estimate the factor loadings, we use a burn-in of 5,000 iterations: aa <- matrix(apply(outq2$ma[5000:50000,],2,mean), ncol=2, nrow=k) The factor loadings can be standardized as follows aa <- aa/sqrt(apply(aa**2,1,sum)) The factor loadings are defined to be positive for their main dimension, depending on the solution found. (When rescaling multiply theta with -1 and regression effects with -1).

aanames <- c("item 1 ","Item 2 ","Item 3 ","Item 4 ","Item 5 ","Item 6 ","Item 7 ","Item 8 ","Item 9","Item 10 ","Item 11 ","Item 12 ","Item 13 ","Item 14 ","Item 15 ","Item 16 ","Item 17 ") names(aa) <- NULL rownames(aa) <- aanames plot(aa[,2]/mdisc2,aa[,1]/mdisc2,xlab="sexual Enhancement Expectancy ",ylab="socio-emotional/community Problems", main="estimated Standardized Factor Loadings",xlim=c(-0.25,1.1), ylim=c(.25,1.1)), text(aa[,2]/mdisc2, aa[,1]/mdisc2, row.names(aa), cex=0.8, pos=4, col="black") abline(a=0,b=0,lwd=.75,lty=3) This plot corresponds with Figure 2, which displays the estimated factor loadings. Other estimates can be retrieved as follows and are given in Table 3 under column Two- Factor : # Covariance matrix Person parameters matrix(apply((outq2$msigmap[10000:50000,]),2,mean),ncol=2) # posterior mean [,1] [,2] [1,] 0.98 0.65 [2,] 0.65 0.98 sqrt(matrix(apply((outq2$msigmap[10000:50000,]),2,var),ncol=2)) #posterior standard deviation sqrt(matrix(apply((outq2n$msigmap[10000:50000,]),2,var),ncol=2)) [,1] [,2] [1,] 0.05 0.07 [2,] 0.07 0.05 #Covariate Effects Factor One apply(outq2$mbreg[10000:50000,,1],2,mean) # posterior mean effects 0.00 (Intercept) -0.20 (Direct questioning) -0.00 (Female) sqrt(apply(outq2$mbreg[10000:50000,,1],2,var)) # posterior variances 0.00 (Intercept) 0.09 (Direct-questioning) 0.06 (Female) #Covariate Effects Factor Two apply(outq2$mbreg[10000:50000,,2],2,mean) # posterior mean effects 0.00 (Intercept) -.22 (Direct questioning).03 (Female) sqrt(apply(outq2$mbreg[10000:50000,,2],2,var)) # posterior variances 0.00 (Intercept).06 (Direct questioning).04 (Female) #-2log-likelihood mean(outq2$mloglik[10000:50000])*(-2) #Item thresholds matrix(apply(outq2$mbeta[10000:50000,],2,mean),nrow=k,ncol=4)

The three-factor (Q=3) model is estimated through the following call making XG=50,000 iterations: outq3n <- MIRTRR(Y,X,Q=3,XG=50000,RR) aa <- matrix(apply(outq3n$ma[10000:50000,],2,mean),ncol=3,nrow=k) aa <- aa/sqrt(apply(aa**2,1,sum)) These factor loadings are given in Table 2, where they are scaled to be positive for the main dimension they relate to. Without giving again all results, the parameter estimates in Table 3 under the label Three Factor follow from the following commands: ## Covariance matrix Person parameters matrix(apply((outq3n$msigmap[10000:50000,]),2,mean),ncol=3) # posterior means sqrt(matrix(apply((outq3n$msigmap[10000:50000,]),2,var),ncol=3)) #posterior variances #Covariate Effects and Variances Related to Factor One apply(outq3n$mbreg[10000:50000,,1],2,mean) sqrt(apply(outq3n$mbreg[10000:50000,,1],2,var)) #Covariate Effects and Variances Related to Factor Two apply(outq3n$mbreg[10000:50000,,2],2,mean) sqrt(apply(outq3n$mbreg[10000:50000,,2],2,var)) #Covariate Effects and Variances Related to Factor Three apply(outq3n$mbreg[10000:50000,,3],2,mean) sqrt(apply(outq3n$mbreg[10000:50000,,3],2,var)) #-2log-likelihood mean(outq3n$mloglik[10000:50000])*(-2) ## Define variable using effects-coding for clustering of students in universities X3 <- matrix(0,ncol=3,nrow=n) X3[which(data$SCHOOL==1),1] <- 1 #Elon X3[which(data$SCHOOL==2),2] <- 1 #GTCC X3[which(data$SCHOOL==3),3] <- 1 #Wake Forest X3[which(data$SCHOOL==4),1:3] <- -1 #UNCG X31 <- matrix(c(x[,-3],x3),ncol=5,nrow=n) To obtain the results in Table 4. outq2sch <- MIRTRR(Y,X31,Q=2,XG=50000,RR) #Two-factor model The results are summarized in the same way as above. outq3sch <- MIRTRR(Y,X31,Q=3,XG=50000,RR) #Three-factor model The results are summarized in the same way as above.

Explanation R-Code and additional results. Compensatory and non-compensatory Multidimensional Randomized Item Response Models. This following describes the steps to arrive at the results presented in the section Simulation Study in the paper Fox, Klein Entink and Avetisyan (BJMSP, 2013). Furthermore, it provides some additional results that were not presented in the paper. Step 1. Loading the R scripts The following R-scripts are required: 1. Simdata.R. Contains the function to simulate data corresponding with the multidimensional randomized response model. Source this file in R via source( simdata.r ). 2. MIRT-RR.Simu.R Contains an adapted version to estimate the presented model that complies with the identification restrictions used to simulate the data. Source this file in R via source('mirt-rr.simu.r'). 3. Simulation study.r This file contains the specific R calls (and instructive comments) to run generate data, run the model and obtain the results. Step 2. Simulate data First, the number of desired items, persons, dimensions, response categories and covariates have to be initialised. In R, run: N <- 750 ## number of test takers K <- 20 ## number of items C <- 4 ## number of response categories Q <- 2 ## number of latent dimensions P <- 3 ## number of covariates (including 1s for the intercept) Then generate data with a call to the simdata() function as follows: # generate data sim <- simdata(n,k,c,q=2,p) ## check the simulated data sim$a # presents the loadings sim$b # presents the simulated regression coefficients sim$sigmap # simulated covariance matrix person parameters.

Step3. Running the model The model was ran twice to be able to obtain certain MCMC convergence diagnostics with a call to the MIRTRR() function. The output was stored in the list objects out1 and out2. In R run: ## run the model twice out1 <- MIRTRR(Y=sim$Y,X= sim$x,q=2,xg=10000) out2 <- MIRTRR(Y=sim$Y,X= sim$x,q=2,xg=10000) Step 4. Analyzing the output All the steps below can also be found in the simulation study.r script. The following commands were used to obtain the results presented in the tables: The estimates for the covariance parameters: ## EAP estimates of the residual covariance matrices colmeans(out1$msigmap) ## obtain SDs of covariance parameters apply(out1$msigmap,2,sd) ## You can make some traceplots of specifc covariance parameters as follows: plot(out1$msigmap[,1]) plot(out1$msigmap[,4]) ## Obtain covariate regression parameter estimates and compare with simulated values colmeans(out1$mbreg) # EAPs apply(out1$mbreg,2,sd) # SDs sim$b ## simulated values ## Making the traceplots as shown in the paper, with horizontal line depicting the true, simulated value. Run the code below as follows and the figures will appear (also presented in Figure 1 below.)

trueb <- c(sim$b)[c(2,3,5,6)] ## the simulated values BB <- out1$mbreg[5001:10000,,] ## obtain the last 5000 samples from the MCMC chain. BB1 <- cbind(bb[,2:3,1],bb[,2:3,2]) ## restructure data matrix for ease of plotting. #BB1 <- cbind(matrix(bb,ncol=1),sort(rep(1:4,1000))) BB <- out2$mbreg[5001:10000,,] BB2 <- cbind(bb[,2:3,1],bb[,2:3,2]) #BB2 <- cbind(matrix(bb,ncol=1),sort(rep(1:4,1000))) #cbind(rep(1:1000,4),bb) #BB <- data.frame(bb) #p <- ggplot(bb, aes(x=bb[,1],y=bb[,2])) y.name <- expression(gamma[11]) par(mfrow=c(2,2)) ## 2-by-2 layout. plot(bb1[,1], type ="l", col = "red", xlab = "Iteration", ylab = y.name) lines(bb2[,1],col = " blue") abline(trueb[1],0, cex = 3) y.name <- expression(gamma[12]) plot(bb1[,2], type ="l", col = "red", xlab = "Iteration", ylab = y.name) lines(bb2[,2],col = " blue") abline(trueb[2],0) y.name <- expression(gamma[21]) plot(bb1[,3], type ="l", col = "red", xlab = "Iteration", ylab = y.name) lines(bb2[,3],col = " blue") abline(trueb[3],0) y.name <- expression(gamma[22]) plot(bb1[,4], type ="l", col = "red", xlab = "Iteration", ylab = y.name) lines(bb2[,4],col = " blue") abline(trueb[4],0)

Figure 1: Traceplots for the regression coefficients With the coda R package, convergence diagnostics can be obtained. We do this for the MCMC chains of the regression coefficients for illustration. The results are presented in Table 1 below. ## Some convergence checks using the coda package. library("coda") BB1 <- as.mcmc(bb1) BB2 <- as.mcmc(bb2) gelman.diag(list(bb1,bb2))

Table 1: Gelman s diagnostics for the traceplots of the regression coefficients Point est. Upper C.I. Gamma_11 1.01 1.03 Gamma_12 1.01 1.02 Gamma_21 1.01 1.04 Gamma_22 1.01 1.05 Multivariate psrf 1.02 To look at the estimated versus the simulated values of the person parameters, we generated Figure 2 below as follows: ### Make plots of simulated versus re-estimated ability parameters. par(mfrow= c(1,2)) x.name <- expression(theta[1]*-simulated) y.name <- expression(theta[1]*-eap) plot(sim$theta[,1],out1$eaptheta[,1], xlab = x.name, ylab = y.name) abline(a=0,b=1,col="red") x.name <- expression(theta[2]*-simulated) y.name <- expression(theta[2]*-eap) plot(sim$theta[,2],out1$eaptheta[,2], xlab = x.name, ylab = y.name)

abline(a=0,b=1,col="red") abline(a=0,b=1,col="red") Figure 2: Simulated and re-estimated (EAP) ability parameters for the 750 persons. Red line is the identity line. Similarly, a plot of the simulated versus the re-estimated factor loadings was made, as shown in Figure 3 below. ## Plots of simulated versus re-estimated loadings. plot(sim$a[2:20,1],colmeans(out1$ma)[2:20], xlab="simulated loadings dim 1", ylab="eap loadings dim 1") abline(0,1) plot(sim$a[2:20,2],colmeans(out1$ma)[22:40], xlab="simulated loadings dim 2", ylab="eap loadings dim 2") abline(0,1)

Figure 3: Simulated and re-estimated loadings for the two dimensions.