A Coding Practice for Preparing Adaptive Multistage Testing Yung-chen Hsu, GED Testing Service, LLC, Washington, DC

Similar documents
Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

Using SAS/OR for Automated Test Assembly from IRT-Based Item Banks

Programming in Fortran 90 : 2017/2018

Optimizing Document Scoring for Query Retrieval

An Entropy-Based Approach to Integrated Information Needs Assessment

SLAM Summer School 2006 Practical 2: SLAM using Monocular Vision

Wishing you all a Total Quality New Year!

Paper 241. Generating Item Responses Based on Multidimensional Item Response Theory. Jeffrey D. Kromrey Cynthia G. Parshall Walter M.

Cluster Analysis of Electrical Behavior

A MOVING MESH APPROACH FOR SIMULATION BUDGET ALLOCATION ON CONTINUOUS DOMAINS

X- Chart Using ANOM Approach

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Steps for Computing the Dissimilarity, Entropy, Herfindahl-Hirschman and. Accessibility (Gravity with Competition) Indices

Improvement of Spatial Resolution Using BlockMatching Based Motion Estimation and Frame. Integration

Support Vector Machines

Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

S1 Note. Basis functions.

CS 534: Computer Vision Model Fitting

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

A Semi-parametric Regression Model to Estimate Variability of NO 2

y and the total sum of

Compiler Design. Spring Register Allocation. Sample Exercises and Solutions. Prof. Pedro C. Diniz

Biostatistics 615/815

Problem Definitions and Evaluation Criteria for Computational Expensive Optimization

Estimation of composite score classification accuracy using compound probability distributions

Adaptive Regression in SAS/IML

Mathematics 256 a course in differential equations for engineering students

The Codesign Challenge

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

Life Tables (Times) Summary. Sample StatFolio: lifetable times.sgp

TECHNIQUE OF FORMATION HOMOGENEOUS SAMPLE SAME OBJECTS. Muradaliyev A.Z.

A New Approach For the Ranking of Fuzzy Sets With Different Heights

On the Virtues of Parameterized Uniform Crossover

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Content Based Image Retrieval Using 2-D Discrete Wavelet with Texture Feature with Different Classifiers

UB at GeoCLEF Department of Geography Abstract

A Unified Framework for Semantics and Feature Based Relevance Feedback in Image Retrieval Systems

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Advanced Computer Networks

The necessity and importance of cognitive diagnosis is being realized by more and

3D vector computer graphics

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Assignment # 2. Farrukh Jabeen Algorithms 510 Assignment #2 Due Date: June 15, 2009.

Subspace clustering. Clustering. Fundamental to all clustering techniques is the choice of distance measure between data points;

A Newton-Type Method for Constrained Least-Squares Data-Fitting with Easy-to-Control Rational Curves

Simulation Based Analysis of FAST TCP using OMNET++

A Fast Visual Tracking Algorithm Based on Circle Pixels Matching

Feature Reduction and Selection

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Related-Mode Attacks on CTR Encryption Mode

Multilevel Analysis with Informative Weights

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

Econometrics 2. Panel Data Methods. Advanced Panel Data Methods I

Reducing Frame Rate for Object Tracking

Edge Detection in Noisy Images Using the Support Vector Machines

Multi-view 3D Position Estimation of Sports Players

Problem Set 3 Solutions

Petri Net Based Software Dependability Engineering

Module Management Tool in Software Development Organizations

CSCI 104 Sorting Algorithms. Mark Redekopp David Kempe

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

EXTENDED BIC CRITERION FOR MODEL SELECTION

Application of Improved Fish Swarm Algorithm in Cloud Computing Resource Scheduling

RADIX-10 PARALLEL DECIMAL MULTIPLIER

The Greedy Method. Outline and Reading. Change Money Problem. Greedy Algorithms. Applications of the Greedy Strategy. The Greedy Method Technique

7/12/2016. GROUP ANALYSIS Martin M. Monti UCLA Psychology AGGREGATING MULTIPLE SUBJECTS VARIANCE AT THE GROUP LEVEL

The Man-hour Estimation Models & Its Comparison of Interim Products Assembly for Shipbuilding

A Robust Method for Estimating the Fundamental Matrix

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Object-Based Techniques for Image Retrieval

Exercises (Part 4) Introduction to R UCLA/CCPR. John Fox, February 2005

Unsupervised Learning

Review of approximation techniques

Topology Design using LS-TaSC Version 2 and LS-DYNA

A New Token Allocation Algorithm for TCP Traffic in Diffserv Network

Design of a Real Time FPGA-based Three Dimensional Positioning Algorithm

Sequential search. Building Java Programs Chapter 13. Sequential search. Sequential search

Lecture 5: Multilayer Perceptrons

CMPS 10 Introduction to Computer Science Lecture Notes

Bayesian Approach for Fatigue Life Prediction from Field Inspection

Comparison of Heuristics for Scheduling Independent Tasks on Heterogeneous Distributed Environments

An Image Fusion Approach Based on Segmentation Region

Parallel Numerics. 1 Preconditioning & Iterative Solvers (From 2016)

End-to-end Distortion Estimation for RD-based Robust Delivery of Pre-compressed Video

AP PHYSICS B 2008 SCORING GUIDELINES

Air Transport Demand. Ta-Hui Yang Associate Professor Department of Logistics Management National Kaohsiung First Univ. of Sci. & Tech.

Solutions to Programming Assignment Five Interpolation and Numerical Differentiation

Load Balancing for Hex-Cell Interconnection Network

Performance Evaluation of Information Retrieval Systems

Very simple computational domains can be discretized using boundary-fitted structured meshes (also called grids)

Data Mining: Model Evaluation

Quality Improvement Algorithm for Tetrahedral Mesh Based on Optimal Delaunay Triangulation

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

Testing Algorithms and Software. Software Support for Metrology Good Practice Guide No. 16. R M Barker, P M Harris and L Wright NPL REPORT DEM-ES 003

A Binarization Algorithm specialized on Document Images and Photos

Chapter 9. Model Calibration. John Hourdakis Center for Transportation Studies, U of Mn

Circuit Analysis I (ENGR 2405) Chapter 3 Method of Analysis Nodal(KCL) and Mesh(KVL)

SVM-based Learning for Multiple Model Estimation

Fusion Performance Model for Distributed Tracking and Classification

Transcription:

SESUG 011 Paper PO-3 A Codng Practce for Preparng Adaptve Multstage Testng Yung-chen Hsu, GED Testng Servce, LLC, Washngton, DC ABSTRACT The purpose of ths paper s to present a smulaton study of a codng practce for preparng adaptve multstage testng (MST) desgns for a credentalng testng program n the comng years MST s an adaptve test admnstraton method n whch a test form s talored as a sequence of pre-constructed modules at tem set level At each adaptaton pont a module s selected to match the profcency estmate of the examnee based on cumulatve performance on prevously admnstered modules For some testng programs, MST s consdered a better ft n ther future test development because the test delvery model offers a balanced tradeoff and a promsng ameloraton between the computerzed adaptve tests and the tradtonal lnear fxed-length tests In the smulaton, a macro s developed to estmate the profcency scores based on tem response theory The algorthm s mplemented wth PROC IML usng Newton-Raphson method To assess the classfcaton consstency and decson accuracy for examnees, kappa coeffcents from PROC FREQ and addtonal consstency measures are computed to more fully characterze the extent of the agreement Practcal polcy questons and test development consderatons are also dscussed INTRODUCTION For many credentalng testng programs, usng computers to admnster exams s a trend n the comng years There are good practcal reasons why adoptng a computer-based test admnstraton s preferred, whch nclude automated scorng and fast reportng, flexble exam schedules and locatons, and potentally hgher effcency and more precse profcency estmaton of examnees through computer-based or adaptve testng Several nnovatve test delvery models were consdered n practce, such as computerzed fxed testng (CFT), temlevel computer-adaptve testng (CAT), and multstage testng (MST) (Henrckson, 007; Jodon, Zensky, and Hambleton, 006) A CFT s analogous to the fxed-tem paper-and-pencl test (PPT) but wth more modern varetes that can be admnstered For example, dfferent examnees may take dfferent forms of the test or receve the tems n dfferent orders In contrast, CAT adapts the dffculty of the test and presents each new tem based on the profcency estmate of an examnee s performance on prevous tems CAT generally uses much fewer test tems than CFT does and has the advantage of offerng mproved effcency n estmatng examnee s profcency level However, there are potental psychometrc ssues and practcal shortcomngs found n the past CAT practces, such as tem exposure control and content balancng Besdes, data management effort and deployment cost are busness and fnancal concerns for admnsterng the test n operatonal envronment To balance the tradeoff between CFT and CAT, MST was proposed MST s a test admnstraton method closely related to CAT but has a test adaptaton at the tem set level nstead For some examnaton programs MST s consdered as a better delvery model n the test development MST may help amelorate the problems encountered n a tradtonal CAT yet stll offers better testng effcency than PPT or CFT Dependng on the test nature and practcal polcy of the testng program, there are practcal test development consderatons related to the desgn and mplementaton Ths study s a codng practce usng smulated data n preparng nformaton for decson makers of a credentalng testng program n the future ADAPTIVE MULTISTAGE TESTS Fgure 1 depcts the generalzed procedure of adaptve multstage test desgn A test form conssts of a seres of test modules and each test taker would potentally take a dfferent set of modules that s best targeted to the ndvdual s ablty MST starts wth an ntal test module for all examnees The ntal module commonly contans tems wth moderate dffculty at the medan profcency of the ntended group or a broad range of dffculty values Wth the examnee s performance on the ntal module, a profcency score can be estmated The profcency estmate s then used to select the next module that matches the examnee s profcency level Normally, the accumulated performance was used to estmate the profcency for decdng a module wth narrow and more focused dffculty n each round untl the test ends 1

SESUG 011 Select the frst stage test module Admnster test Estmate profcency End? Yes Report fnal profcency No Select the next stage test module Fgure 1 MST procedure SIMULATION A smulaton usng the two-stage test desgn was conducted to demonstrate the procedure for the preparaton work of the test development Rasch model s used n ths smulaton, whch s the smplest tem response theory (IRT) model for dchotomous tem havng only one parameter for the examnee and one for the tem that genercally referred to as a threshold Mathematcally, Rasch model can be expressed as P( u j 1 1) 1 exp( b ) representng the probablty of answerng a partcular dchotomously scored tem correctly gven the profcency level of a test taker, where b s the dffculty of tem whle s the ablty of person j The steps of conductng the smulaton study are outlned as follows: Data preparaton: Smulate true profcency scores ( t group ID, a sngle stage module, and both frst and second stage modules j j ), tem parameters ( b ), tem responses ( u ), true Profcency estmaton: Use u and b from the frst stage to estmate profcency score 1 Based on 1 to assgn second stage modules Combne data from both stages and estmate the fnal profcency score 1 Also estmate sngle stage profcency scores 0 for comparson Evaluaton: Calculate psychometrc propertes and related statstcs for evaluaton DATA PREPARATION To smulate true profcency scores for 3,000 test canddates, N(0,3) t are generated from a normal dstrbuton wth predefned upper and lower bounds We assume that the test wll be used to classfy the canddates nto three groups: A, B, and C (eg, pass advanced, pass, and fal) The canddates are dvded nto three groups Three sets of tem parameter b are also generated wth 1,000 each accordng to the true profcency scores t from a normal dstrbuton wth dfferent mean and bounds The three sets are combned form a pool of 60 tems and b usng Rasch model One frst stage module, whch contans Then, the responses u are generated from t 30 tems, and three second stage modules wth 0 tems each are assgned A 50-tem sngle-stage module s also assgned for comparson The frst stage module s desgned to contan a broad range of dffculty values whle the second stage modules has more tems wth dffculty located near the average profcency scores of the respectve group Fgures and 3 llustrate the test nformaton curves of the frst and second modules, respectvely The test nformaton s smply the sum over tems of the amount of tem nformaton Namely,

SESUG 011 The tem nformaton functon s defned as where Q 1 P, and ) P ( ) I ( ) I ( ) I ( ) P( ), P ( ) Q ( ) P ( For Rasch model, the expresson s smply I ( ) P ( ) Q ( ) 5 0 4 5 4 0 3 5 3 0 5 0 1 5 1 0 0 5 0 0-5 -4-3 - -1 0 1 3 4 5 t het a Fgure Test nformaton curve of the frst stage module 5 0 4 5 4 0 3 5 3 0 5 0 1 5 1 0 0 5 0 0-5 -4-3 - -1 0 1 3 4 5 t het a Fgure 3 Test nformaton curves of the second stage modules 3

SESUG 011 PROFICIENCY ESTIMATION The measure of the profcency or ablty of a gven examnee s the maxmum lkelhood estmate based on the responses to the tems and the values of the parameters of the tems For a test module wth N tem, u {0,1 } refers to a test taker s response to tem, whch s scored dchotomously Under the assumpton of local ndependence, the probablty of the vector of tem response U ( u1, u,, un ) s gven by the lkelhood functon U P u Q 1 u Pr( ), where P s the Rasch functon, Q 1 P, and s the ablty of the test taker The dervatves of the loglkelhood functon wth respect to the test taker s L 1 up 1 P Q ( 1 u ) Q, where L s the natural logarthm of the lkelhood functon Pr For Rasch model, we have Then P P Q L ( u P ) and and Q P L Q Usng a Taylor seres expanson to solve the lkelhood equatons, we have where 0 L( ) L( 0 ) L( 0 ) ( 0 ) 0 can be vewed as a tral value for the root of at the n th step The approxmate value of the next step n1 can be derved from (u P ) n1 n PQ wth second-order approxmaton The above teratve scheme s known as the Newton-Raphson method and the process must be repeated untl become suffcently small A SAS/IML module, whch mplemented the Newton-Raphson method, s used to smplfy the task The followng statements show a macro that calls the IML module to estmate the ablty for every test taker n an teraton loop The ntal tral values are all set to be zero n the macro To mprove the effcency, they can frst be replaced and estmated by usng the total score or other means %macro rbtrasch( /* Ablty estmaton */ dsr=, /* tem response */ dsp=, /* Item parameter */ dst= /* Ablty */ ); P Q proc ml; nmaxiter=30; mndelta=001; ubtheta=5; lbtheta=-5; *max teraton number; *theta upper bound; *theta lower bound; use &dsr; read all var _num_ nto r; use &dsp; read all var _num_ nto b; 4

SESUG 011 ntakers=nrow(r); nitems=ncol(r); nitems1=nrow(b); * Error check; f nitems^=nitems1 then do; prnt "ERROR: Inconsstent nputs"; stop; * Newton-Raphson equaton teraton loop; start rascht(t0,pb,r,mxt,ubt,lbt,mnd); t=t0; nt=1; n=nrow(pb); do whle(nt<=mxt); snum=00; sdem=00; do =1 to n; p=10/(10+exp(pb[]-t)); w=p*(10-p); v=r[]-p; snum=snum+v; sdem=sdem+w; dta=snum/sdem; * Check convergence and set bounds; f abs(dta)<mnd then nt=mxt; else f dta>ubt then delta=ubt; else f dta<lbt then delta=lbt; * Update; t=t+dta; nt=nt+1; return (t); fnsh rascht; * Intal estmate t0=j(ntakers,1,0); * Loop through every test taker; theta=j(ntakers,1,0); do j=1 to ntakers; theta[j]=rascht(t0[j],b,r[j,],nmaxiter,ubtheta,lbtheta,mndelta); create &dst from theta[colname='theta']; append from theta; close theta; qut; run; %mend rbtrasch; EVALUATION The correlaton matrx of the true profcency scores, estmated scores from the two-stage (30 and 0 tems) test and from the sngle stage test s provded n Table 1 by usng PROC CORR procedure The correlaton between the true score and the two-stage profcency estmates s hgher than the correcton between the true score and the snglestage profcency estmates Table 1 Correlaton matrx of true scores, two-stage, and sngle stage profcency estmates True score Two-stage Two-stage 085134 Sngle-stage 07085 079453 5

SESUG 011 For most credentalng testng programs the decson accuracy for classfyng canddates s crucal We assumed that both A and B groups are collapsed as Pass, and C group s Fal Then the Cohen s kappa coeffcent, whch provdes a measure of agreement, can be obtaned from PROC FREQ procedure wth TEST KAPPA opton as Po Pc 1 P where P o s observed agreement and P c s chance agreement The values of kappa range from -1 to +1 However, negatve kappa s unusual n practce as the observed agreement s less than change agreement A number of studes (Sm and Wrght, 005; Vera and Garrett, 005) show that there are other factors can nfluence the magntude of kappa and suggested reportng addtonal ndces for provdng a clear pcture, such as prevalence ndex and bas ndex Both of them are ncluded n Table although low kappa and hgh prevalence are very rare n most well desgned educatonal assessment program The decson accuracy of the two-stage case s slghtly hgher but not sgnfcant n the smulaton The mean and standard devaton of the dfference to the true score are also provded even though accurate profcency estmates are less crtcal for most credentalng tests The results show that two-stage desgn yelds more accurate estmates Table Cohen s kappa, Prevalence ndex, and Bas ndex Cohen kappa Prevalence ndex Bas ndex Mean Standard devaton True/Two-stage 08516 0357 00077 0341 10768 True/Sngle-stage 08475 03357 0003 0486 1657 CONCLUSION Ths smulaton study s a codng practce of prelmnary preparaton work for a credentalng testng program n the comng years MST s beng consdered and s expected to have some dstnct advantages over conventonal fxedlength testng In the test development, there are many ssues to resolve In order to provde nformaton for decson makng, parameters and data wll be adjusted accordngly when more nformaton, such as data collected from feld testng, tem characterstcs n the future tem bank, and data derved from prevous tests, become avalable Ths paper llustrates the procedure usng SAS n preparng nformaton for makng decson and outlnes some steps for use n the development REFERENCES Hendrckson, A (007) An NCME Instructonal model on multstage testng Educatonal Measurement: Issues and Practce, 6(), 44-5 Jodon, MG, Zensky, A, and Hambleton, RK (006) Comparson of the psychometrc propertes of several computer-based test desgns for credentalng exams wth multple purposes Appled Measurement n Educaton, 19(3), 03-0 Sm, J and Wrght, CC (005) The kappa statstc n relablty studes: Use, nterpretaton, and sample sze requrements Physcal therapy, 85(3), 57-68 Vera, AJ and Garrett, J M (005) Understandng nterobserver agreement: The kappa statstc Famly Medcne, 37(5), 360-363 CONTACT INFORMATION Your comments and questons are valued and encouraged Contact the author at: Yung-chen Hsu GED Testng Servce, LLC One Dupont Crcle NW Washngton, DC 0003 Work Phone: 0-939-9717 E-mal: yung-chenhsu@gedtestngservcecom Web: wwwgedtestngservcecom SAS and all other SAS Insttute Inc product or servce names are regstered trademarks or trademarks of SAS Insttute Inc n the USA and other countres ndcates USA regstraton Other brand and product names are trademarks of ther respectve companes c 6