IRT Models for Polytomous. American Board of Internal Medicine Item Response Theory Course

Size: px

Start display at page:

Download "IRT Models for Polytomous. American Board of Internal Medicine Item Response Theory Course"

Alaina Glenn
6 years ago
Views:

1 IRT Models for Polytomous Response Data American Board of Internal Medicine Item Response Theory Course

2 Overview General Theory Polytomous Data Types & IRT Models Graded Response Partial Credit Nominal Response

3 General Theory of Polytomous Response Models Item response models for polytomous data extend the general underlying IRT premise Modeling item response behavior as a function of (possibly multiple) latent traits. to data types that are not only limited to binary responses. The type of model is dependent on the type of data The type of model is dependent on the type of data that have been collected.

4 Performance Assessment Assessments that allow examinees more opportunity to demonstrate the skills being measured. Often involved more complicated tasks then answering MC questions Essays, projects, writing samples, problem solving. Scores are often assigned in more than two categories (i.e., not just incorrect/correct).

5 Polytomous Data Polytomous data simply means that item level data are available in more than two categories. Score categories are typically y ordered (i.e., higher number = greater score), though it is not a requirement for all models.

6 Problem with IRT Models Designed only to handle 0-1 data. Assumption of local independence of item responses may be violated for sets of items related to a common stimulus (i.e., reading comprehension items related to a single passage).

7 Polytomous IRT Models These models are able to accommodate polytomous response data from: Multiple-category items. Data created by combining sets of items linked to a common stimulus.

8 Graded Response Model (AKA Ordered Categorical Responses)

9 Graded Responses One of the more straight-forward polytomous data types is that of the graded-response format. Graded response data consists of a score that is an ordinal number, typically y ranging g from 0 to M. 0, 1, 2,, M. Higher scores represent better performance on the item.

10 Example Graded Response Item From the 2006 Illinois Standards Achievement Test (ISAT):

11 ISAT Scoring Rubric

12 Additional Example Item Cognitive items are not the only ones where graded response data occurs. Likert-type questionnaires are commonly scored using ordered categorical values. Typically, these ordered categories are treated as continuous data (as with Factor Analysis). Consider the following item from the Satisfaction With Life Scale (e.g. SWLS, Diener, Emmons, Larsen, & Griffin, 1985)

13 SWLS Item #1 I am satisfied with my life. 1. Strongly disagree. 2. Disagree. 3. Slightly disagree. 4. Neither agree nor disagree. 5. Slightly agree. 6. Agree. 7. Strongly agree.

14 Graded Response Model An extension of the 2PL model for graded response data is Samejima s Graded Response Model (1969). The graded d response model specifies the likelihood that an examinee of a given ability will respond provide a response that t receives a grade of x ij (x ij = 0,,M).

15 Graded Response Model Models the probability of scoring in a given category (x) or higher, given level of ability (θ). For any given category, this will look just like the familiar 2-PL (see next slide).

16 Graded Response Model Samejima s Graded Response Model: P ( ) * ( ) * X = x θ = P ( ) ij θ P θ ij ij i x i x + 1 ij i Where: P * x ij ( θi ) Da j b ( θ ) ( θ ) i = P X ij xij θi = Da j ( θi b ) e 1+ e xij x ij

17 Graded Response Model 1 Probability of x or high her Ability (θ) b jx = location parameter, category boundary for score x b jx = the point on the ability scale where P = 0.5

18 Graded Response Model 1 Probability of x or high her Ability (θ) a jx = discrimination parameter, constant over response categories for a given item ( homogenous case ) D = scaling factor (D = 1.7)

19 Graded Response Model 1 Probability of x or high her Ability (θ) For item j, m j + 1 scoring categories. m j is the highest possible score, 0 is the lowest. There are m j boundaries between categories

20 GRM Parameters P * x ij Da j e = ij ij i = Da θ ( θ ) P( X x θ ) i 1+ + e ( θi b ) xij j ( i b ) x ij For each item j: one discrimination parameter a j. M-1 difficulty parameters b 1 < < b m P(X ij 0) = 1.0 P(X ij M) = 0.0

21 Cumulative Category Characteristic Functions 1 x = 0 lity of x or highe er Probabi 0.5 x = 1 x = 2 x = 3 x = Ability (θ)

22 P ( θ) = P ( θ) P ( θ) * * jx jx j( x+ 1) P ( θ ) = P ( θ) P ( θ) = 1.0 P ( θ) * * * j0 j0 j1 j1 P ( θ ) = P ( θ) P ( θ) * * j1 j1 j2 P P P * * 2( θ j ) = 2( θ j ) j3( θ ) P ( θ ) = P ( θ ) P ( θ ) * * j 3 j 3 j 4 P ( θ) = P ( θ) P ( θ) = P ( θ) 0 * * * j4 j4 j5 j4 m j Also, P ( θ ) = 1.0 x= 0 jx

23 Score Category 1 Response Functions x = 0 x = 4 Pro obability of x 0.5 x = 1 x = 2 x = Ability (θ)

24 Graded Response Function Open graded response demo.xls for file demo Graded d Response IRF 1.0 P (X >= x Theta) P(X=0 Theta) P(X=1 Theta) P(X=2 Theta) P(X=3 Theta) P(X=4 Theta) Theta

25 Partial Credit Model and Generalized Partial Credit Model

26 Partial Credit Model (PCM) and Generalized PCM Alternative Polytomous Model: Probability of getting a score of x rather than x-1 is given by a 2-PL: P ( θ ) a e j θ jx jx ( 1) ( ) ( ) = + a 1 j + jx j x ( b ) P θ P θ e θ When a j is free to vary GPCM When a j = 1 for all n items PCM ( b ) jx

27 GPCM and PCM As before, scores range from 0 to m j Probability of a score of x = P jx (θ) Probability of obtaining a score of x, given a score of either x or x-1, given θ, is modeled by a 2-PL. Dichotomize i adjacent categories (e.g., 0-1, 1-2, 2-3, 3-4)

28 Alternative GPCM formulization a ( θ b c ) P ( θ ) e j j jx jx = P ( θ ( 1)( θ) + P ( θ) 1+ e j x jx a ( b c ) j j jx where weeb = average gedifficulty, cuy, and j Note: jx j jx jx j jx Sum(c jx )=0 b = b c (so c = b b ) PARSCALE uses this formulization for all polytomous models (GPCM, PCM, and GRM)

29 Generalized Partial Credit Model aj jx P ( θ ) e θ jx = aj P ( θ) P ( θ) 1 e θ + + j( x 1) jx ( b ) ( b ) jx P jx ( θ ) x k= e = h = m j aj θ bjk h= 0 aj( θ bjk) 0 Sum of a( θ -b) b)terms for each category up to x k 0 e = e ( ) Sum of numerator terms for all possible categories This is the item score category function

30 GPCM Transition from a score of (x-1) tox x is given by a 2-PL model. Known as local estimation, because not all data are incorporated when estimating category boundary parameters (i.e., location or difficulty parameters). Example: when estimating the category difficulty for x=1 vs. x=0, scores of 2,3,4 are ignored.

31 Threshold / Category Boundary b jx is the point on the θ scale where the probability of being in either adjacent category is equal: When θ = P j( x 1) b jx P ( θ ) 1 jx = so, P ( θ ) = P ( θ ) jx j( x 1) ( θ) + P ( θ) 2 jx

32 PCM and GPCM vs. GRM Very similar to GRM, except these models allow for the fact that t one or more of the score categories may never have a point where the probability bilit of x is greatest t for a given θ level. Because of local estimation, there is no guarantee that category b-values will be ordered. This is a flaw or a strength, depending on how you look at it

33 PCM and GPCM vs. GRM GPCM and GRM will generally agree very closely, l unless one or more of the score categories is underused. GRM will force the categories boundary parameters to be ordered, GPCM and PCM do not. For this reason, comparing results with the same data across models can point out interesting gphenomena in your data.

34 Score Category Response Functions P jx (θ) These look about the same as the GRM, except Ability (θ) P1 P2 P3 P4 P5

35 Score Category Response Functions P jx (θ) a score of 2 is never more probable than the others Ability (θ) P1 P2 P3 P4 P5

36 Partial Credit Function Open partial credit model demo.xls for file demo Partial Credit IRF P (X = x Theta) P(X=0 Theta) P(X=1 Theta) P(X=2 Theta) P(X=3 Theta) P(X=4 Theta) Theta

37 Simplifying Polytomous Items

38 Expected Scores It is useful to combine the probability information from categories into one function for an expected score: m j EX ( θ ) = xp ( θ ) j jx x= 0 Multiply each score by its P, add up over categories for any θ level.

39 Item Characteristic Function This expected score function acts as a single Item Characteristic Function (analogous to the ICC for dichotomous). m j EX ( θ ) = xp ( θ ) j jx x= 0

40 Item Characteristic Function 4 Score = E(X) Expected Ability (θ)

41 Expected Proportion Correct 1 Expe ected Pro oportion = E(X)/m j Ability (θ)

42 1 ICF x = 0 x = 4 Pro obability of x 0.5 x = 1 x = 2 x = Ability (θ)

43 Item Characteristic Function ICF is a good summary of an item and is used in test development, DIF studies, model-data data fit evaluations.

44 Test Characteristic Function As before, equal to the sum of expected scores over items: n m j = EX ( θ ) xp( θ ) jx j= 1 x= 0 This could include dichotomous, polytomous, or mixed-format tests.

45 Nominal Response Models

46 Nominal Response Data Nominal Response Models (e.g. Bock, 1972) are models for polytomous data where item responses are not numeric values. Rather, item responses are in the form of nominal categories. Information gained from the use of such models can be useful for detecting ti which h distracter t options are better than others in multiple choice tests.

47 Nominal Response Model Features The nominal response model is a model for the categorical responses possible within an item of a test. The model specifies the probability bilit that t an examinee with a given value of the latent trait selects response option m.

48 Example Nominal Response Item

49 Additional Item Types Non-cognitive tests can also contain differing item types that could be modeled using a Nominal Response Model. F l id it f For example, consider an item from a questionnaire about political attitudes

50 Example Nominal Response Item Which political party would you identify yourself with? 1. Democrat 2. Republican 3. Independent 4. Green 5. Unaffiliated

51 Nominal Response Model An extension of the 2PL model for nominal response data is Bock s Nominal Response Model (1972). The nominal response model specifies the likelihood that an examinee of a given ability will selection option k j of fitem j.

52 Nominal Response Model Bock s Nominal Response Model: ( ) e P X ij = k θi = m e h= 1 a jk a ( θ b ) i jh ( θ b ) i jk jh

53 NRM Parameters a ( ) e P X ij = k θi = m e h= 1 ( θ bb ) jk θ i a jh jk ( θ b ) i jh For each level k of item j: one discrimination parameter a jk and one difficulty parameter b jk. No additional constraints on the parameter values.

54 Nominal Response Function Open nominal response demo.xls for file demo Nominal Response IRF eta) P(X=m Th P(X=a Theta) P(X=b Theta) P(X=c Theta) P(X=d Theta) Theta

55 Conclusion Numerous polytomous IRT models exist. Each extends the basic philosophy of IRT to a a data type that is not binary. Some differ with respect to how each model is parameterized for the same data. Some differ with respect to the differing types of data to be modeled. Each polytomous IRT model specifies the behavior of an examinee as a function of a latent trait (often representing ability).

56 Next Estimation of Parameters for IRT Models Estimate person parameters when item parameters are known Joint estimation of person and item parameters

Multidimensional Item Response Theory (MIRT) University of Kansas Item Response Theory Stats Camp 07

Multidimensional Item Response Theory (MIRT) University of Kansas Item Response Theory Stats Camp 07 Overview Basics of MIRT Assumptions Models Applications Why MIRT? Many of the more sophisticated approaches