A comparison study of IRT calibration methods for mixed-format tests in vertical scaling

Size: px
Start display at page:

Download "A comparison study of IRT calibration methods for mixed-format tests in vertical scaling"

Transcription

1 University of Iowa Iowa Research Online Theses and Dissertations 7 A comparison study of IRT calibration methods for mixed-format tests in vertical scaling Huijuan Meng University of Iowa Copyright 7 Huijuan Meng This dissertation is available at Iowa Research Online: Recommended Citation Meng, Huijuan. "A comparison study of IRT calibration methods for mixed-format tests in vertical scaling." PhD (Doctor of Philosophy) thesis, University of Iowa, 7. Follow this and additional works at: Part of the Education Commons

2 A COMPARISON STUDY OF IRT CALIBRATION METHODS FOR MIXED-FORMAT TESTS IN VERTICAL SCALING by Huijuan Meng An Abstract Of a thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Psychological and Quantitative Foundations (Educational Measurement and Statistics) in the Graduate College of The University of Iowa December 7 Thesis Supervisors: Professor Walter Vispoel Research Scientist Won-Chan Lee

3 ABSTRACT The purpose of this dissertation was to investigate how different Item Response Theory (IRT)-based calibration methods affect student achievement growth pattern recovery. Ninety-six vertical scales ( 3) were constructed using different combinations of IRT calibration methods (separate, pair-wise concurrent, semiconcurrent, and concurrent), lengths of common-item set ( vs. common items), types of common-item set (dichotomous-only vs. mixed-format), and numbers of polytomous items ( vs. ) for three simulated datasets differing in the number of examinees sampled per grade (5,, 5). Three indexes (absolute bias, standard error of equating and root mean square error) were used to evaluate the performance of the calibration methods on proficiency score distribution recovery over 8 replications. These indexes were derived for seven growth distribution criterion parameters (mean, standard deviation, effect size, and proportions of examines within four proficiency categories). Although exceptions were found in the results for all criterion parameters, important general trends did emerge. Pair-wise concurrent and semi-concurrent calibration methods performed better than concurrent and separate calibration methods for most criterion parameters and combinations of research conditions. Separate calibration, the vertical scaling method used most often in practice, provided the poorest results in most instances. Accuracy of vertical scaling also typically improved with larger samplings of examinees, more common items, mixing item formats in the common-item set, and increases in the number of polytomous items in the common-item set. General trends and exceptional cases from the various analyses are described in

4 detail with tables provided for choosing an appropriate vertical scaling method in different decision contexts. Abstract Approved: d d d d d d Thesis Supervisor Title and Department Date Thesis Supervisor Title and Department Date s d d d d d

5 A COMPARISON STUDY OF IRT CALIBRATION METHODS FOR MIXED-FORMAT TESTS IN VERTICAL SCALING by Huijuan Meng A thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Psychological and Quantitative Foundations (Educational Measurement and Statistics) in the Graduate College of The University of Iowa December 7 Thesis Supervisors: Professor Walter Vispoel Research Scientist Won-Chan Lee

6 Copyright by HUIJUAN MENG 7 All Rights Reserved

7 Graduate College The University of Iowa Iowa City, Iowa CERTIFICATE OF APPROVAL PH.D. THESIS This is to certify that the Ph.D. thesis of Huijuan Meng has been approved by the Examining Committee for the thesis requirement for the Doctor of Philosophy degree in Psychological and Quantitative Foundations (Educational Measurement and Statistics) at the December 7 graduation. Thesis Committee: Walter Vispoel, Thesis Supervisor Won-Chan Lee, Thesis Supervisor Michael Kolen Timothy Ansley Richard Dykstra

8 To Zhongming and My Parents ii

9 ACKNOWLEDGEMENTS First I would like to thank Dr. Walter Vispoel, my advisor, thesis supervisor, and professor who led me into the field of measurement. This thesis could not have been written without his continuous guidance, support, and assistance throughout my academic program. I really cherish the memory of being his student. I would also like to thank Dr. Won-Chan Lee, my thesis co-supervisor. During my thesis writing, Dr. Lee was very supportive and contributed many thoughtful insights. The suggestions and comments he made helped make this dissertation better. I sincerely want to thank other committee members. Dr. Michel Kolen taught me much about vertical scaling, the area in which this dissertation was written. Dr. Timothy Ansley helped me obtain the opportunity to work for Iowa Testing Programs, who supported my studies financially for more than three years. Dr. Richard Dykstra, my mathematics statistic professor, was very patient and generous with his time. I sincerely thank all of them for serving on my committee. I wish to thank Dr. Robert Kirkpatrick, my internship mentor in Pearson Educational Measurement, for helping me obtain important information used in this dissertation and for providing valuable suggestions for my thesis. Finally, I would like to thank my family my parents, my husband, and my lovely daughter. They provided the motivations for me to pursue my degrees in U.S. in the past years. Their love has made me what I am today! iii

10 ABSTRACT The purpose of this dissertation was to investigate how different Item Response Theory (IRT)-based calibration methods affect student achievement growth pattern recovery. Ninety-six vertical scales ( 3) were constructed using different combinations of IRT calibration methods (separate, pair-wise concurrent, semiconcurrent, and concurrent), lengths of common-item set ( vs. common items), types of common-item set (dichotomous-only vs. mixed-format), and numbers of polytomous items ( vs. ) for three simulated datasets differing in the number of examinees sampled per grade (5,, 5). Three indexes (absolute bias, standard error of equating and root mean square error) were used to evaluate the performance of the calibration methods on proficiency score distribution recovery over 8 replications. These indexes were derived for seven growth distribution criterion parameters (mean, standard deviation, effect size, and proportions of examines within four proficiency categories). Although exceptions were found in the results for all criterion parameters, important general trends did emerge. Pair-wise concurrent and semi-concurrent calibration methods performed better than concurrent and separate calibration methods for most criterion parameters and combinations of research conditions. Separate calibration, the vertical scaling method used most often in practice, provided the poorest results in most instances. Accuracy of vertical scaling also typically improved with larger samplings of examinees, more common items, mixing item formats in the common-item set, and increases in the number of polytomous items in the common-item set. General trends and exceptional cases from the various analyses are described in iv

11 detail with tables provided for choosing an appropriate vertical scaling method in different decision contexts. v

12 TABLE OF CONTENTS LIST OF TABLES... ix LIST OF FIGURES... xv CHAPTER I. INTRODUCTION... IRT Calibration Methods...3 Common-Item Design and Control Factors... Evaluation Criteria...9 Research Questions...9 II. LITERATURE REVIEW... IRT Calibration Methods...3 IRT models...3 The Three-parameter Logistic Model (3PLM)... The Generalized Partial Credit Model (GPCM)...5 IRT Scaling... The Nature of IRT Scale... IRT Calibration Methods...7 Proficiency Score Estimation Methods...3 IRT Vertical Scale Evaluation Criteria...7 Vertical Scaling Data Collection Designs...9 Comparison of Data Collection Designs...3 Common-item Designs: Investigation Factors...3 Sample Size...3 Length of Common-Item Set...33 Type of Common-Item Set...3 Number of Polytomous Items...3 III. METHODOLOGY...37 Vertical Scaling Scenario...38 Configuration of Mixed-format Tests for Simuation...39 Item Pool for Mixed-format Tests... Dichotomous Item and Dichotomous Common Item Selection... Polytomous Item and Polytomous Common Item Selection...3 Simulation Study...5 Factor Investigated...5 Data Generation... Data Calibration Using ICL...8 Evaluation Criteria and Data Analysis...53 Proficiency Score Means...5 Proficiency Score Standard Deviations...55 Effect Sizes...55 Proficiency Score Classification Proportions...5 Summary...57 vi

13 IV. RESULTS...5 Effects of Calibration Method on Scaling Results...5 Proficiency Score Means...7 Absolute Bias...7 Standard Error (SE)...7 Root Mean Squared Error (RMSE)...8 Proficiency Score Standard Deviations...9 Absolute Bias...9 Standard Error (SE)...7 Root Mean Squared Error (RMSE)...7 Effect Sizes...7 Absolute Bias...7 Standard Error (SE)...7 Root Mean Squared Error (RMSE)...73 Classification Proportion Level...7 Absolute Bias...7 Standard Error (SE)...75 Root Mean Squared Error (RMSE)...7 Classification Proportion Level...77 Absolute Bias...77 Standard Error (SE)...78 Root Mean Squared Error (RMSE)...78 Classification Proportion Level Absolute Bias...79 Standard Error (SE)...8 Root Mean Squared Error (RMSE)...8 Classification Proportion Level...8 Absolute Bias...8 Standard Error (SE)...8 Root Mean Squared Error (RMSE)...83 Summary of Results for Calibration Methods...8 Effects of Sample Size, Length of Common-Item Set, Type of Common-Item Set and Number of Polyotomous Items on Scaling Results...8 Effect of Sample Size...87 Effect of Length of Common-Item Set...88 Effect of Type of Common-Item Set...9 Effect of Number of Polytomous Items (Part )...93 Effect of Number of Polytomous Items (Part )...9 Summary Results for Sample Size, Length of Common-Item Set, Type of Common-Item Set and Number of Polytomous Items...9 V. SUMMARY AND DISCUSSION...8 Review of Study Goals and Methodology...8 Major Findings by Research Question...8 Research Question...8 Research Question...88 Research Question Research Question...9 Research Question Research Question...97 Limitations and Suggestions for Future Research...99 vii

14 Final Summary and Comments... REFERENCES... APPENDIX A. PARSCALE AND ICL CODES... APPENDIX B. TABLES B-B3: INDEXES RESULTS... APPENDIX C. FIGURES B-B3: INDEXES RESULTS...8 viii

15 LIST OF TABLES Table 3.. Test Characteristics...58 Table 3.. Item Pool Summary...58 Table 3.3. Item Pool Item Parameter Statistics (Mean)...58 Table 3.. Study Test Item Parameter Statistics (Mean)...59 Table 3.5. Summary of 9 Simulation Conditions... Table 3.. Scaling Constants to Scale to Adjacent Grade... Table 3.7. Cumulative Scaling Constants to Scale to Grade 5 Scale... Table 3.8. True Proficiency Score Distribution from Grade 3 to Grade 8... Table 3.9. Proficiency Cut Scores for Grade 3 to Grade 8... Table 3.. True Vertical Scale Parameters... Table.. Calibration Methods Comparison Results (Mean)...98 Table.. Calibration Methods Comparison Results (Standard Deviation)... Table.3. Calibration Methods Comparison Results (Effect Size)... Table.. Calibration Methods Comparison Results (Level )... Table.5. Calibration Methods Comparison Results (Level )... Table.. Calibration Methods Comparison Results (Level 3)...8 Table.7. Calibration Methods Comparison Results (Level )... Table.8. Calibration Methods Comparison Results: Methods Producing the Smallest Index Values... Table.9. Calibration Methods Comparison Results: Methods Producing the Largest Index Values... Table.. Calibration Methods Comparison Results: Frequencies of Producing the Smallest Index Values... Table.. Calibration Methods Comparison Results: Frequencies of Producing the Largest Index Values... Table.. Effect of Factor : Sample Size (Mean)...7 Table.3. Effect of Factor : Sample Size (Standard Deviation)...8 Table.. Effect of Factor : Sample Size (Effect Size)...9 ix

16 Table.5. Effect of Factor : Sample Size (Level )... Table.. Effect of Factor : Sample Size (Level )... Table.7. Effect of Factor : Sample Size (Level 3)... Table.8. Effect of Factor : Sample Size (Level )...3 Table.9. Overall Trends for Effect of Sample Size and Exceptional Cases... Table.. Effect of Factor : Length of Common-Item Set (Mean)...5 Table.. Effect of Factor : Length of Common-Item Set (Standard Deviation)... Table.. Effect of Factor : Length of Common-Item Set (Effect Size)...7 Table.3. Effect of Factor : Length of Common-Item Set (Level )...8 Table.. Effect of Factor : Length of Common-Item Set (Level )...9 Table.5. Effect of Factor : Length of Common-Item Set (Level 3)...3 Table.. Effect of Factor : Length of Common-Item Set (Level )...3 Table.7. Overall Trends for Effects of Length of Common-Item Set and Exceptional Cases...3 Table.8. Effect of Factor 3: Type of Common-Item Set (Mean)...33 Table.9. Effect of Factor 3: Type of Common-Item Set (Standard Deviation)...3 Table.3. Effect of Factor 3: Type of Common-Item Set (Effect Size)...35 Table.3. Effect of Factor 3: Type of Common-Item Set (Level )...3 Table.3. Effect of Factor 3: Type of Common-Item Set (Level )...37 Table.33. Effect of Factor 3: Type of Common-Item Set (Level 3)...38 Table.3. Effect of Factor 3: Type of Common-Item Set (Level )...39 Table.35. Overall Trends for Effects of Type of Common-Item Set and Exceptional Cases... Table.3. Effect of Factor (): Number of Polytomous Items in Tests Containing a Mixed-format CI Set (Mean)... Table.37. Effect of Factor (): Number of Polytomous Items in Tests Containing a Mixed-format CI Set (Standard Deviation)...5 Table.38. Effect of Factor (): Number of Polytomous Items in Tests Containing a Mixed-format CI Set (Effect Size)... x

17 Table.39. Effect of Factor (): Number of Polytomous Items in Tests Containing a Mixed-format CI Set (Level )...7 Table.. Effect of Factor (): Number of Polytomous Items in Tests Containing a Mixed-format CI Set (Level )...8 Table.. Effect of Factor (): Number of Polytomous Items in Tests Containing a Mixed-format CI Set (Level 3)...9 Table.. Effect of Factor (): Number of Polytomous Items in Tests Containing a Mixed-format CI Set (Level )...5 Table.3. Overall Trends for Effects of Number of Polytomous Items in Tests Containing a Mixed-Format CI and Exceptional Cases...5 Table.. Effect of Factor (): Number of Polytomous Items in Tests Containing a DI-only CI Set (Mean)...53 Table.5. Effect of Factor (): Number of Polytomous Items in Tests Containing a DI-only CI Set (Standard Deviation)...5 Table.. Effect of Factor (): Number of Polytomous Items in Tests Containing a DI-only CI Set (Effect Size)...55 Table.7. Effect of Factor (): Number of Polytomous Items in Tests Containing a DI-only CI Set (Level )...5 Table.8. Effect of Factor (): Number of Polytomous Items in Tests Containing a DI-only CI Set (Level )...57 Table.9. Effect of Factor (): Number of Polytomous Items in Tests Containing a DI-only CI Set (Level 3)...58 Table.5. Effect of Factor (): Number of Polytomous Items in Tests Containing a DI-only CI Set (Level )...59 Table 5.. Summary Absolute Biases for Standard Deviation over 8 Replications for N= and N=5, Test : PI, CI (DI only)... Table 5.. Standard Deviations of Stocking-Lord Linking Parameter Estimates... Table 5.3. Polytomous Item Parameter Estimates for Test 5-Test 8 (N=5, Replication=)...3 Table B. Absolute Bias (Proficiency Score Mean), N=5...8 Table B. Absolute Bias (Proficiency Score Mean), N=...9 Table B3. Absolute Bias (Proficiency Score Mean), N=5... Table B. SE (Proficiency Score Mean), N=5... Table B5. SE (Proficiency Score Mean), N=... xi

18 Table B. SE (Proficiency Score Mean), N=5...3 Table B7. RMSE (Proficiency Score Mean), N=5... Table B8. RMSE (Proficiency Score Mean), N=...5 Table B9. RMSE (Proficiency Score Mean), N=5... Table B. Absolute Bias (Proficiency Score SD), N=5...7 Table B. Absolute Bias (Proficiency Score SD), N=...8 Table B. Absolute Bias (Proficiency Score SD), N=5...9 Table B3. SE (Proficiency Score SD), N=5...3 Table B. SE (Proficiency Score SD), N=...3 Table B5. SE (Proficiency Score SD), N=5...3 Table B. RMSE (Proficiency Score SD), N= Table B7. RMSE (Proficiency Score SD), N=...3 Table B8. RMSE (Proficiency Score SD), N= Table B9. Absolute Bias (Effect Size), N=5...3 Table B. Absolute Bias (Effect Size), N=...37 Table B. Absolute Bias (Effect Size), N= Table B. SE (Effect Size), N= Table B3. SE (Effect Size), N=... Table B. SE (Effect Size), N=5... Table B5. RMSE (Effect Size), N=5... Table B. RMSE (Effect Size), N=...3 Table B7. RMSE (Effect Size), N=5... Table B8. Absolute Bias (Classification Proportion Level ), N=5...5 Table B9. Absolute Bias (Classification Proportion Level ), N=... Table B3. Absolute Bias (Classification Proportion Level ), N=5...7 Table B3. SE (Classification Proportion Level ), N=5...8 Table B3. SE (Classification Proportion Level ), N=...9 xii

19 Table B33. SE (Classification Proportion Level ), N=5...5 Table B3. RMSE (Classification Proportion Level ), N=5...5 Table B35. RMSE (Classification Proportion Level ), N=...5 Table B3. RMSE (Classification Proportion Level ), N= Table B37. Absolute Bias (Classification Proportion Level ), N=5...5 Table B38. Absolute Bias (Classification Proportion Level ), N=...55 Table B39. Absolute Bias (Classification Proportion Level ), N=5...5 Table B. SE (Classification Proportion Level ), N= Table B. SE (Classification Proportion Level ), N=...58 Table B. SE (Classification Proportion Level ), N= Table B3. RMSE (Classification Proportion Level ), N=5... Table B. RMSE (Classification Proportion Level ), N=... Table B5. RMSE (Classification Proportion Level ), N=5... Table B. Absolute Bias (Classification Proportion Level 3), N=5...3 Table B7. Absolute Bias (Classification Proportion Level 3), N=... Table B8. Absolute Bias (Classification Proportion Level 3), N=5...5 Table B9. SE (Classification Proportion Level 3), N=5... Table B5. SE (Classification Proportion Level 3), N=...7 Table B5. SE (Classification Proportion Level 3), N=5...8 Table B5. RMSE (Classification Proportion Level 3), N=5...9 Table B53. RMSE (Classification Proportion Level 3), N=...7 Table B5. RMSE (Classification Proportion Level 3), N=5...7 Table B55. Absolute Bias (Classification Proportion Level ), N=5...7 Table B5. Absolute Bias (Classification Proportion Level ), N=...73 Table B57. Absolute Bias (Classification Proportion Level ), N=5...7 Table B58. SE (Classification Proportion Level ), N= Table B59. SE (Classification Proportion Level ), N=...7 xiii

20 Table B. SE (Classification Proportion Level ), N= Table B. RMSE (Classification Proportion Level ), N= Table B. RMSE (Classification Proportion Level ), N=...79 Table B3. RMSE (Classification Proportion Level ), N=5...8 xiv

21 LIST OF FIGURES Figure 3.. Data Set-up for Each Calibration Method...3 Figure 3.. Theta Estimates: PARSCALE vs. ICL (N=995)... Figure 3.3. Theta Estimate Differences: PARSCALE - ICL (N=995)... Figure.. Calibration Methods Comparison Results (Mean)... Figure.. Calibration Methods Comparison Results (Standard Deviation)... Figure.3. Calibration Methods Comparison Results (Effect Size)... Figure.. Calibration Methods Comparison Results (Level )... Figure.5. Calibration Methods Comparison Results (Level )... Figure.. Calibration Methods Comparison Results (Level 3)... Figure.7. Calibration Methods Comparison Results (Level )... Figure.8. Effect of Factor : Sample Sizes (Mean)... Figure.9. Effect of Factor : Sample Sizes (Standard Deviation)... Figure.. Effect of Factor : Sample Sizes (Effect Size)...3 Figure.. Effect of Factor : Sample Sizes (Level )...3 Figure.. Effect of Factor : Sample Sizes (Level )... Figure.3. Effect of Factor : Sample Sizes (Level 3)... Figure.. Effect of Factor : Sample Sizes (Level )...5 Figure.5. Effect of Factor : Length of Common-Item Set (Mean)... Figure.. Effect of Factor : Length of Common-Item Set (Standard Deviation)... Figure.7. Effect of Factor : Length of Common-Item Set (Effect Size)...7 Figure.8. Effect of Factor : Length of Common-Item Set (Level )...7 Figure.9. Effect of Factor : Length of Common-Item Set (Level )...8 Figure.. Effect of Factor : Length of Common-Item Set (Level 3)...8 Figure.. Effect of Factor : Length of Common-Item Set (Level )...9 Figure.. Effect of Factor 3: Type of Common-Item Set (Mean)...7 Figure.3. Effect of Factor 3: Type of Common-Item Set (Standard Deviation)...7 xv

22 Figure.. Effect of Factor 3: Type of Common-Item Set (Effect Size)...7 Figure.5. Effect of Factor 3: Type of Common-Item Set (Level )...7 Figure.. Effect of Factor 3: Type of Common-Item Set (Level )...7 Figure.7. Effect of Factor 3: Type of Common-Item Set (Level 3)...7 Figure.8. Effect of Factor 3: Type of Common-Item Set (Level )...73 Figure.9. Effect of Factor (): Number of Polytomous Items in Tests Containing a Mixed-format CI Set (Mean)...7 Figure.3. Effect of Factor (): Number of Polytomous Items in Tests Containing a Mixed-format CI Set (Standard Deviation)...7 Figure.3. Effect of Factor (): Number of Polytomous Items in Tests Containing a Mixed-format CI Set (Effect Size)...75 Figure.3. Effect of Factor (): Number of Polytomous Items in Tests Containing a Mixed-format CI Set (Level )...75 Figure.33. Effect of Factor (): Number of Polytomous Items in Tests Containing a Mixed-format CI Set (Level )...7 Figure.3. Effect of Factor (): Number of Polytomous Items in Tests Containing a Mixed-format CI Set (Level 3)...7 Figure.35. Effect of Factor (): Number of Polytomous Items in Tests Containing a Mixed-format CI Set (Level )...77 Figure.3. Effect of Factor (): Number of Polytomous Items in Tests Containing a DI-only CI Set (Mean)...78 Figure.37. Effect of Factor (): Number of Polytomous Items in Tests Containing a DI-only CI Set (Standard Deviation)...78 Figure.38. Effect of Factor (): Number of Polytomous Items in Tests Containing a DI-only CI Set (Effect Size)...79 Figure.39. Effect of Factor (): Number of Polytomous Items in Tests Containing a DI-only CI Set (Level )...79 Figure.. Effect of Factor (): Number of Polytomous Items in Tests Containing a DI-only CI Set (Level )...8 Figure.. Effect of Factor (): Number of Polytomous Items in Tests Containing a DI-only CI Set (Level 3)...8 Figure.. Effect of Factor (): Number of Polytomous Items in Tests Containing a DI-only CI Set (Level )...8 Figure C. Absolute Bias (Proficiency Score Mean), N=5...8 xvi

23 Figure C. Absolute Bias (Proficiency Score Mean), N=...83 Figure C3. Absolute Bias (Proficiency Score Mean), N=5...8 Figure C. SE (Proficiency Score Mean), N= Figure C5. SE (Proficiency Score Mean), N=...8 Figure C. SE (Proficiency Score Mean), N= Figure C7. RMSE (Proficiency Score Mean), N= Figure C8. RMSE (Proficiency Score Mean), N=...89 Figure C9. RMSE (Proficiency Score Mean), N=5...9 Figure C. Absolute Bias (Proficiency Score SD), N=5...9 Figure C. Absolute Bias (Proficiency Score SD), N=...9 Figure C. Absolute Bias (Proficiency Score SD), N= Figure C3. SE (Proficiency Score SD), N=5...9 Figure C. SE (Proficiency Score SD), N=...95 Figure C5. SE (Proficiency Score SD), N=5...9 Figure C. RMSE (Proficiency Score SD), N= Figure C7. RMSE (Proficiency Score SD), N=...98 Figure C8. RMSE (Proficiency Score SD), N= Figure C9. Absolute Bias (Effect Size), N=5...3 Figure C. Absolute Bias (Effect Size), N=...3 Figure C. Absolute Bias (Effect Size), N=5...3 Figure C. SE (Effect Size), N= Figure C3. SE (Effect Size), N=...3 Figure C. SE (Effect Size), N= Figure C5. RMSE (Effect Size), N=5...3 Figure C. RMSE (Effect Size), N=...37 Figure C7. RMSE (Effect Size), N= Figure C8. Absolute Bias (Classification Proportion Level ), N= xvii

24 Figure C9. Absolute Bias (Classification Proportion Level ), N=...3 Figure C3. Absolute Bias (Classification Proportion Level ), N=5...3 Figure C3. SE (Classification Proportion Level ), N=5...3 Figure C3. SE (Classification Proportion Level ), N=...33 Figure C33. SE (Classification Proportion Level ), N=5...3 Figure C3. RMSE (Classification Proportion Level ), N= Figure C35. RMSE (Classification Proportion Level ), N=...3 Figure C3. RMSE (Classification Proportion Level ), N= Figure C37. Absolute Bias (Classification Proportion Level ), N= Figure C38. Absolute Bias (Classification Proportion Level ), N=...39 Figure C39. Absolute Bias (Classification Proportion Level ), N=5...3 Figure C. SE (Classification Proportion Level ), N=5...3 Figure C. SE (Classification Proportion Level ), N=...3 Figure C. SE (Classification Proportion Level ), N= Figure C3. RMSE (Classification Proportion Level ), N=5...3 Figure C. RMSE (Classification Proportion Level ), N=...35 Figure C5. RMSE (Classification Proportion Level ), N=5...3 Figure C. Absolute Bias (Classification Proportion Level 3), N= Figure C7. Absolute Bias (Classification Proportion Level 3), N=...38 Figure C8. Absolute Bias (Classification Proportion Level 3), N= Figure C9. SE (Classification Proportion Level 3), N= Figure C5. SE (Classification Proportion Level 3), N=...33 Figure C5. SE (Classification Proportion Level 3), N= Figure C5. RMSE (Classification Proportion Level 3), N= Figure C53. RMSE (Classification Proportion Level 3), N=...33 Figure C5. RMSE (Classification Proportion Level 3), N= Figure C55. Absolute Bias (Classification Proportion Level ), N= xviii

25 Figure C5. Absolute Bias (Classification Proportion Level ), N= Figure C57. Absolute Bias (Classification Proportion Level ), N= Figure C58. SE (Classification Proportion Level ), N= Figure C59. SE (Classification Proportion Level ), N=...3 Figure C. SE (Classification Proportion Level ), N=5...3 Figure C. RMSE (Classification Proportion Level ), N=5...3 Figure C. RMSE (Classification Proportion Level ), N=...33 Figure C3. RMSE (Classification Proportion Level ), N=5...3 xix

26 CHAPTER I. INTRODUCTION Vertical scaling refers to the process of placing scores on tests measuring similar constructs but of different difficulty levels onto a common metric called a vertical scale. To chart students academic growth over years, multilevel tests are usually used to construct such a scale for educational achievement or aptitude tests. To ensure that students gain sufficiently from school learning, it is important for educators and researchers to examine students growth patterns over years so that the quality of education can be maintained and improved. In the past, vertical scales have been incorporated into nationally normed elementary test batteries such as TerraNova Comprehensive Tests of Basic Skills (CTB/McGraw-Hill), Stanford Achievement Test Series (Harcourt Assessment), and the Iowa Tests of Basic Skills (Hoover, Dunbar & Frisbie, 3), but have rarely been used in state accountability systems because most states have traditionally not assessed performance across contiguous grades. Due to the impact of the No Child Left Behind Act of (NCLB; Public Law 7-), however, there is an increasing demand for establishing a psychometrically defensible vertical scale for high-stakes, accountability tests so that students year-to-year growth in a particular subject area can be measured and tracked. Many factors can affect the nature of a vertical scale (Kolen & Brennan, ). Choices for a data collection design and for a test scoring method are two fundamental ones. Data collection designs for vertical scaling include the common-item design, equivalent groups design, single group design (referred to as a common-examinee design in this dissertation) and scaling test design (Kolen & Brennan, ). Several researchers have compared the impact of data collection designs on the nature of resulting vertical

27 scales and growth interpretation (Andrews, 995; Hendrickson, Kolen, & Tong, ; Hendrickson, Wei, Kolen, & Tong, 5; Paek & Young, 5; Peterson, Kolen, & Hoover, 989; Tong, 5). Findings are not congruent across studies, however. Similarly, choice of a test scoring method, which include Thurstone, Hieronymus, and item response theory (IRT) scaling, has been examined in several studies (Becker & Forsth, 99; Clemans, 993; Hoover, 98; Williams, Pommerich, & Thissen, 998; Yen & Burket, 997). Again, results for scaling and interpretation across studies are not always consistent. In this dissertation, I focus exclusively on IRT scaling within a common-item (CI) design. This approach is being used increasingly due to the straightforward implementation and administration of the CI design in practice. Several important factors have been investigated in previous vertical scaling studies using IRT, which include calibration methods, proficiency score estimation methods, grade levels of linking items, choices for base year, and priors for proficiency score distributions (Bishop & Omar, ; Chin, Kim, & Nering, ; Hendrickson et al., ; Hendrickson et al. 5; Hendrickson, Cao, Chae, & Li, ; Jiao & Wang, 7; Karkee, Lewis, Hoskens, Yao, & Haug, 3; Karkee, Wang, & Green, ; Kim, Frisbie, Kolen, & Kim, 7; Meng, Kolen, & Lohman, ; Tong, 5; Tong & Kolen, 7; Wang, Jiao, Young, & Jin, ; Yao & Lewis, ; Yao & Mao, ). In these studies, evaluating the performance of different IRT calibration methods has received much more emphasis than any other factors.

28 3 IRT Calibration Methods Various calibration methods can be used to construct an IRT common scale across groups. Two frequently researched methods are separate and concurrent calibration methods. In this dissertation, two additional less-studied methods, pair-wise concurrent and semi-concurrent calibration are also included. In separate calibration, item parameters and examinee proficiency scores for each grade are estimated individually. For each grade, the proficiency score mean is set to be and standard deviation to be in the calibration, which is referred to as a (, ) scale. However, the parameter estimates may not be on the same scale because the (, ) scales are group dependent and may not have the same origin and unit. This can be addressed by identifying a base group in which parameter estimates remain on the (, ) scale, and the estimates for other grades are put onto the base group scale through some transformation processes. Several linking procedures have been developed to place separate calibration results onto a common scale, including the mean/mean method (Loyd & Hoover, 98), the mean/sigma method (Macro, 977), the Stocking-Lord method (Stocking & Lord, 983), and the Haebara method (Haebara, 98). The mean/mean and mean/sigma method are called moment procedures, and the Stocking-Lord and Haebara method are called characteristic curve procedures. In implementing any of these procedures, the slope (A) and intercept (B) of a suitable linear transformation are identified through which one set of parameter estimates is put onto the scale of another set of parameter estimates. These procedures were originally developed for dichotomous IRT models; and then extended to polytomous IRT models (Baker, 99, 993; Kim & Cohen, 995;

29 Hattori, 998; Kim & Hanson, ) and mixed-format tests (Kim & Lee, ). Researchers comparing linking methods have typically found that characteristic curve procedures produce more stable results than moment procedures (Baker & Al-Karni, 99; Hanson & Béguin, ; Kim & Cohen, 99; Kim & Song, ; Ogasawara, ). Nevertheless, the mean/mean and mean/sigma procedures are still used in some testing programs due in part to their simplicity in calculating the transformation constants. Employing the separate calibration method typically requires several computer runs and numerous repetitions of the linking process. Once the software appropriate for simultaneously calibrating multiple groups became available, researchers started investigating the merits and weaknesses of a concurrent calibration by combining data from each grade and treating items not taken by any particular group as not reached. The common scale is then established in a single computer run by concurrently estimating parameters for all items in the multilevel test. Consequently, all estimates are on the same IRT scale. This calibration method is efficient and can stabilize estimation of parameters for the common items when the IRT unidimensionality assumption holds in the data (Hanson & Béguin, ). However, when multidimensionality occurred in the data, separate calibration using the Stocking-Lord procedure produced more accurate results than scaling using concurrent calibration (Béguin, Hanson, & Glas, ; Béguin & Hanson, ). In a vertical scaling context where the number of groups involved in a study is larger than two, the concurrent calibration method is sometimes preferred over others because of its efficiency (Hendrickson et al., ; Tong & Kolen, ; Wang, Jiao, & Severance, 5; Wang et al., ).

30 5 However, the potential presence of multidimensionality in the data puts the appropriateness of using the concurrent calibration method into question (Boughton, Yao, & Lorié, 5; Tong & Kolen, ; Wang et al., 5; Yao & Mao, ; Yao & Lewis, ). As a result, pair-wise concurrent and semi-concurrent calibration hybrids of the separate and concurrent methods have been proposed (Chin et al., ; Karkee et al., 3, Meng et al., ). Although rarely used or researched, these methods may circumvent some inherent limitations in separate and concurrent calibration. In pair-wise concurrent calibration, data from every pair of adjacent grades are combined and calibrated in one computer run. Parameter estimates from all paired data are then put onto the same scale through fewer linking processes. In semiconcurrent calibration, two computer runs are performed. Within each run, data from three or four consecutive grades are calibrated together. As a result, only one linking process is necessary to put parameters onto a common scale. Because the data used in each run for pair-wise and semi-concurrent calibration do not span all grades, the possible existence of multidimensionality in concurrent calibration data induced by content variability across grades may be diminished to some degree. Furthermore, compared to separate calibration, these alternative methods may produce more stable parameter estimates when the sample sizes are small (Karkee et al., 3). In summary, numerous comparison studies of separate and concurrent calibration have been conducted in the past that focused either on real or simulated data. Comparison studies in vertical scaling contexts usually use empirical responses from operational testing programs (Bishop & Omar, ; Hendrickson et al., ; Hendrickson et al., 5; Hendrickson et al., ; Karkee et al., 3; Karkee et al.,

31 ; Kim et al., 7; Boughton et al., 5; Meng et al, ; Tong, 5; Yao & Lewis, ). With such data, differences among scaling results can be detected, but not the most accurate scaling method. This question can only be answered in a simulation study in which true parameters are known in advance. Most comparison studies using simulated data have been performed in equating situations where responses from two groups were generated (Cohen & Kim, 998; Hanson & Béguin, ; Li, Tam, & Tompkins, ; Kim & Cohen, 998, ; Kim & Kolen, ; Skorupski, Jodoin, & Keller, 3). All of these studies focused only on dichotomous item responses. Also, for simulation studies conducted under vertical scaling conditions, multilevel test forms have typically included only dichotomous items (Chin et al., ; Paek & Young, 5; Tong & Kolen, ). Because constructing a vertical scale usually involves six or more nonequivalent groups and using polytomous items is now common in the measurement field, it is important for researchers to extend comparison studies of IRT calibration methods to simulated mixed-format test data. On the basis of the literature surveyed, a simulation study was conducted in this dissertation. The results may provide practitioners with a better understanding of the effectiveness of the separate, pair-wise concurrent, semiconcurrent, and concurrent calibration methods. Common-item Design and Control Factors In this dissertation, students responses were generated to mimic data collected via a common-item (CI) design. When calibrating data collected via this design, examinees proficiency score levels are assumed to be different across grades and a common scale is constructed through embedding a set of overlapping items in both

32 7 upper-grade and lower-grade tests. If an IRT scaling method is employed, the parameters can be estimated either using separate computer runs for each grade data or using a single simultaneous run for all grade data. In a simulation study under this design, control factors can be included to examine their possible mediating or moderating effects on the resulting scales. This dissertation includes four such factors. The first factor is the sample size. This factor is included in many IRT calibration method comparison studies conducted in equating contexts (Cohen & Kim, 998; Hanson & Béguin, ; Kim & Cohen, ; Kim & Kolen, ; Kim & Lee, ; Skorupski et al., 3). A rule of thumb Harris (993) suggested in her review of literature is to use approximately 5 examinees per form for the IRT three-parameter model. When a polytomous IRT model is employed, Muraki and Bock (3) proposed that although estimation involves a greater number of parameters, the polytomous item response data contain more information than the dichotomous item response data and therefore provide more stable estimation. Accordingly, Muraki noted that sample sizes around 5 are marginally acceptable in research applications, but 5- are required in operational use. Muraki s suggestions, however, should be considered in light of the datasets he used in PARSCALE examples, which usually contain small numbers of items, varying from to. For this reason, larger sample sizes were included in this dissertation to examine how much greater accuracy larger sample sizes might bring to the scaling results. The second factor is the length of the common-item set in the test (Chin et al., ; Kim & Cohen, 998; Hanson & Béguin, ). According to Hanson and Béguin (), a longer common-item set leads to smaller mean square errors (MSE) in item parameter estimation. In real testing situations, when the size of the item pool is small or

33 8 the quality of items does not meet the test blueprint requirements, the number of usable items may be limited. Thus, including this factor in a simulation study can provide practitioners with direct evidence of how different lengths of common-item sets may affect scaling results. The third factor is the type of common-item set used (Chien, Brennan, & Kolen, ; Hwang, Im, Si, Seong, & Kim, ; Kim & Lee, ; Tsai, Lee, & Timbol, ). For a test that consists of both dichotomous and polytomous items, scaling results may be improved if the common-item set also contains both item types in proportion to the total test form. However, in practice, testing programs may use dichotomous items only to link mixed-format test forms. To what extent this decision may distort scaling result is worthy of investigation. The last factor is the number of polytomous items included in the tests. Research indicates that polytomous items may provide more IRT information than dichotomous items and thereby improve the linking results (Donoghue, 99; Kim & Lee, ; Samejima, 97; Thissen, 97). In this dissertation, the effect of doubling the number of polytomous items was examined under two conditions: () the common-item sets contain dichotomous items only, and () the common-item sets contain both dichotomous items and polytomous items. Because the mixed-format common-item sets are arranged proportionally to the whole tests, when the number of polytomous items is doubled in the whole test, the number of polytomous items is also doubled in the common-item set. Therefore, the common-item sets may contain different numbers of points even if the length of common-item sets remains the same for tests containing different numbers of polytomous items.

34 9 In summary, effects of common-item set length, type of common-item set, number of polytomous items, and sample size have been examined in previous IRT scale transformation research for simulated CI data, but usually in an equating situation in which only two groups of examinees are simulated. It is unclear how these factors may affect common scales obtained from using different IRT calibration methods in a vertical scaling context in which data from five or six grade levels are often used to construct a common scale. Evaluation Criteria The effect of using different IRT calibration methods on resulting vertical scales was examined using simulated unidimensional data. For these datasets, a true vertical scale was defined by specifying proficiency score means and standard deviations in advance. The performance of four IRT calibration methods: separate, pair-wise concurrent, semi-concurrent, and concurrent were compared with respect to growth pattern recovery under different conditions. Root mean squared error (RMSE), standard error (SE) and absolute bias were calculated for proficiency score means, standard deviations, effect sizes and classification proportions over replications to provide interpretable evidence about the performance of each method. Formulas and procedures to compute RMSE, SE, and bias are described in more detail in Chapter Three. Research Questions Vertical scaling is a complicated process involving many decisions. These decisions, in turn, may affect the resulting vertical scales and the interpretation of students growth. The literature cited here reveals a need for extending comparisons of different IRT calibration methods to simulated mixed-format multilevel test data. More

35 specifically, the primary goal of this dissertation was to evaluate the performance of four IRT calibration methods separate, pair-wise concurrent, semi-concurrent, and concurrent when unidimensionality holds in the data. This goal was accomplished through simulation analyses. After putting all parameter estimates onto a common scale, RMSE, SE and absolute bias were computed for proficiency score means, standard deviations, effect sizes and classification proportions. These indexes provided information regarding which calibration method most adequately preserved the vertical scale. In addition, four control factors including the length of the common-item set, type of common-item set, number of polytomous items and sample size, which are frequently included in equating research but rarely considered in the vertical scaling research, were incorporated into this study so that their accumulated effect over grades could be assessed in a systematic way. The multilevel tests used in this dissertation included both dichotomous and polytomous items. It was assumed that dichotomous item response data fit the threeparameter logistic model (3PLM) and polytomous item response data fit Muraki s (99) generalized partial credit model (GPCM). Item parameters used in this dissertation were derived from a large-scale state reading assessment program. The nature of the tests and the test construction processes are presented in more detail in Chapter Three. Research questions investigated in the simulation study are summarized below. Research Question : When the IRT unidimensionality assumption holds, how does the use of different IRT calibration methods separate, pair-wise concurrent, semiconcurrent and concurrent affect the accuracy of growth pattern recovery?

36 Research Question : To what degree is the error decreased when the sample size is increased? Research Question 3: How much is error reduced when the length of commonitem set is doubled? Research Question : Is error reduced or enlarged when the type of common-item set is changed from dichotomous-only to mixed-format? Research Question 5: Is error reduced or enlarged when the number of polytomous items is doubled in both the whole test and in the mixed-format commonitem set? Research Question : Is error reduced or enlarged when the number of polytomous items is doubled in the tests containing a dichotomous-item-only commonitem set?

37 CHAPTER II. LITERATURE REVIEW Research into methods for constructing psychometrically defensible vertical scales has increased since the implementation of the No Child Left Behind (NCLB) Act of. This legislation requires that states annually test students in grade 3 through grade 8 according to state reading and mathematics standards. Under NCLB, students growth from one grade level to the next is a key component of state accountability systems. Tracking the progress of individual students requires that scores from different grade level tests be placed on a common scale, and vertical scaling provides a technique for establishing such a scale. However, vertical scales can be constructed by using various approaches with dissimilar characteristics (e.g., grade-to-grade growth, grade-tograde variability, and the separation of grade distributions) that may lead to different or even contradictory interpretations of students growth. Such interpretations may have considerable consequences on student and school accountability. Establishing a valid vertical scale is a prerequisite for appropriately monitoring student academic progress over years. Although numerous studies have focused on vertical scaling, most recent ones have been conducted under an item response theory (IRT) framework (see, e.g., Tong, 5). These studies provided a solid foundation for the research questions examined in the simulation study conducted in this dissertation. In the study, mixed-format test datasets were simulated for grade 3 to grade 8, and the performance of four IRT calibration methods: separate, pair-wise concurrent, semi-concurrent, and concurrent were compared when the IRT unidimensionality assumption held. In addition, four control factors: length of the common-item set, type of common-item set, number of

38 3 polytomous items, and level of sample size were incorporated to investigate their effects on the resulting scales. In this chapter, literature with regard to IRT scaling methods is reviewed. More specifically, two commonly used IRT models are illustrated, followed by discussions of the nature of IRT scales, calibration methods, and proficiency score estimation methods. Then, IRT vertical scale evaluation criteria used in previous studies are discussed. Finally, literature with respect to the common-item design is reviewed to justify the selection of four control factors investigated in this dissertation. IRT Scaling Methods IRT models are being used increasingly in creating vertical scales for multilevel test forms. However, compared to traditional scaling methods (e.g., Thurstone and Hieronymus), IRT scaling imposes more stringent assumptions on the data being used. In addition, conducting IRT scaling involves making decisions about the selection of data collection designs, the choice of IRT models, and the choice of both proficiency score estimation methods and item calibration procedures. Resulting IRT scales may differ when the assumptions do not hold or when different choices have been made for any of the aforementioned factors. Nevertheless, the item-invariance and group-invariance (Lord, 98) features of IRT models provide testing programs with strong incentives for using an IRT scaling method. IRT Models The family of IRT models has grown rapidly due to advances in statistical theories and technology for implementing them. Over IRT mathematical models exist and can be used to describe students proficiency scores with eight to ten of the

39 models in wide use in the testing field (van der Linden & Hambleton, 997). IRT models can be unidimensional or multidimensional. In unidimensional IRT models, examinee proficiency scores are explained by a single latent variable θ in the range [, ]. In multidimensional IRT models, more than a single proficiency construct is necessary to adequately account for examinee test performance. IRT models can be used for dichotomous item responses or for polytomous item responses. Widely used dichotomous IRT models include the Rasch model (Rasch, 9; Wright & Stone, 979), the two-parameter logistic model (Birnbaum, 98; Lord, 958), and the three-parameter logistic model (Lord, 98). Polytomous IRT models that are often employed in the testing programs include Muraki s generalized partial credit model (99), Samejima graded response model (997), and Bock s nominal model (997). These IRT models differ in how the functional form of the item characteristic curve (ICC), or item characteristic function (ICF) relates the probability of success on an item to the examinee proficiency score. The ICC or ICF is a nonlinear regression of the item score on the proficiency score measured by the test. This regression is referred to as an ICC for unidimensional IRT models, and as the ICF for multidimensional IRT models. In this dissertation, two IRT models, the three-parameter logistic model (3PLM) and the generalized partial credit model (GPCM) were employed to generate data. These models are reviewed in detail in the sections that follow. The Three-parameter Logistic Model (3PLM) In the three-parameter logistic model (Lord, 98), the probability of a randomly selected examinee i at proficiency score level θ i, answering item j correctly is defined as

40 5 where exp[ Da j ( θ i b j )] P( θ i a j, b j, c j ) = c j + ( c j ), (-) + exp[ Da ( θ b )] a j is called the discrimination or slope parameter, b j is referred to as the difficulty or threshold parameter, c j is the lower asymptote or pseudo-guessing parameter (included in the model to account for item response data from very low-ability examinees), and D is a scaling constant (typically.7). A large value of the a-parameter reflects high discrimination power of the item. Larger values of the b-parameter are associated with more difficult items. The c-parameter for an item is usually in the range of to.5 for a multiple-choice question with four options (Lord, 98). When the 3PLM is used, variations in item difficulty, discrimination, and guessing factor are accounted for simultaneously. The Generalized Partial Credit Model (GPCM) The GPCM, credited to Muraki (99), is one of the most frequently used models for polytomous test data. In this model, the probability that an examinee i at proficiency level θ i scores k on item j is expressed as where P exp k j exp c= v= i Da ( θ b j j i + d j i j jv v= = ijk ( θi a j, b j, d j, d j, L d jm ), (-) j m j c Da ( θ b m j is the number of categories in the response to item j, parameter, b j is the difficulty parameter, j + d ) jv ) a j is the discrimination d jv is the item category parameter for category v, and D is a scaling constant (typically.7). In addition, two constraints are necessary m j d jk k= for using this model: () setting d j and () setting =.

Examining the Impact of Drifted Polytomous Anchor Items on Test Characteristic Curve (TCC) Linking and IRT True Score Equating

Examining the Impact of Drifted Polytomous Anchor Items on Test Characteristic Curve (TCC) Linking and IRT True Score Equating Research Report ETS RR 12-09 Examining the Impact of Drifted Polytomous Anchor Items on Test Characteristic Curve (TCC) Linking and IRT True Score Equating Yanmei Li May 2012 Examining the Impact of Drifted

More information

Detecting Polytomous Items That Have Drifted: Using Global Versus Step Difficulty 1,2. Xi Wang and Ronald K. Hambleton

Detecting Polytomous Items That Have Drifted: Using Global Versus Step Difficulty 1,2. Xi Wang and Ronald K. Hambleton Detecting Polytomous Items That Have Drifted: Using Global Versus Step Difficulty 1,2 Xi Wang and Ronald K. Hambleton University of Massachusetts Amherst Introduction When test forms are administered to

More information

Comparing and Combining Dichotomous and Polytomous Items with SPRT Procedure in Computerized Classification Testing

Comparing and Combining Dichotomous and Polytomous Items with SPRT Procedure in Computerized Classification Testing Comparing and Combining Dichotomous and Polytomous Items with SPRT Procedure in Computerized Classification Testing C. Allen Lau Harcourt Brace Educational Measurement Tianyou Wang ACT Paper presented

More information

Package plink. April 26, 2017

Package plink. April 26, 2017 Version 1.5-1 Date 2017-04-26 Title IRT Separate Calibration Linking Methods Depends R (>= 3.3.3), methods, lattice Imports MASS, statmod Package plink April 26, 2017 Author Jonathan P. Weeks

More information

Journal of Statistical Software

Journal of Statistical Software JSS Journal of Statistical Software July 2010, Volume 35, Issue 12. http://www.jstatsoft.org/ plink: An R Package for Linking Mixed-Format Tests Using IRT-Based Methods Jonathan P. Weeks University of

More information

IRT Models for Polytomous. American Board of Internal Medicine Item Response Theory Course

IRT Models for Polytomous. American Board of Internal Medicine Item Response Theory Course IRT Models for Polytomous Response Data American Board of Internal Medicine Item Response Theory Course Overview General Theory Polytomous Data Types & IRT Models Graded Response Partial Credit Nominal

More information

Application of a General Polytomous Testlet Model to the Reading Section of a Large-Scale English Language Assessment

Application of a General Polytomous Testlet Model to the Reading Section of a Large-Scale English Language Assessment Research Report Application of a General Polytomous to the Reading Section of a Large-Scale English Language Assessment Yanmei Li Shuhong Li Lin Wang September 00 ETS RR-0- Listening. Learning. Leading.

More information

An Investigation of Using a Polynomial Loglinear Model to Evaluate Differences in Test Score Distributions. Diane Talley

An Investigation of Using a Polynomial Loglinear Model to Evaluate Differences in Test Score Distributions. Diane Talley An Investigation of Using a Polynomial Loglinear Model to Evaluate Differences in Test Score Distributions Diane Talley A thesis submitted to the faculty of the University of North Carolina at Chapel Hill

More information

Equating. Lecture #10 ICPSR Item Response Theory Workshop

Equating. Lecture #10 ICPSR Item Response Theory Workshop Equating Lecture #10 ICPSR Item Response Theory Workshop Lecture #10: 1of 81 Lecture Overview Test Score Equating Using IRT How do we get the results from separate calibrations onto the same scale, so

More information

A comparison of smoothing methods for the common item nonequivalent groups design

A comparison of smoothing methods for the common item nonequivalent groups design University of Iowa Iowa Research Online Theses and Dissertations Summer 2014 A comparison of smoothing methods for the common item nonequivalent groups design Han Yi Kim University of Iowa Copyright 2014

More information

FORMULATION AND BENEFIT ANALYSIS OF OPTIMIZATION MODELS FOR NETWORK RECOVERY DESIGN

FORMULATION AND BENEFIT ANALYSIS OF OPTIMIZATION MODELS FOR NETWORK RECOVERY DESIGN FORMULATION AND BENEFIT ANALYSIS OF OPTIMIZATION MODELS FOR NETWORK RECOVERY DESIGN Approved by: Dr. Richard Barr Dr. Eli Olinick Dr. Marion Sobol Dr. Jerrell Stracener Dr. Stephen A. Szygenda FORMULATION

More information

Implementing the a-stratified Method with b Blocking in Computerized Adaptive Testing with the Generalized Partial Credit Model. Qing Yi ACT, Inc.

Implementing the a-stratified Method with b Blocking in Computerized Adaptive Testing with the Generalized Partial Credit Model. Qing Yi ACT, Inc. Implementing the a-stratified Method with b Blocking in Computerized Adaptive Testing with the Generalized Partial Credit Model Qing Yi ACT, Inc. Tianyou Wang Independent Consultant Shudong Wang Harcourt

More information

A New Multi-Class Mixture Rasch Model for Test Speededness. Andrew A. Mroch, Daniel M. Bolt, James A. Wollack. University of Wisconsin-Madison

A New Multi-Class Mixture Rasch Model for Test Speededness. Andrew A. Mroch, Daniel M. Bolt, James A. Wollack. University of Wisconsin-Madison A New Multi-Class Mixture Rasch Model for Test Speededness Andrew A. Mroch, Daniel M. Bolt, James A. Wollack University of Wisconsin-Madison e-mail: aamroch@wisc.edu Paper Presented at the Annual Meeting

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 52 A Statistical Criterion to Assess Fitness of Cubic-Spline Postsmoothing Hyung Jin Kim Robert L. Brennan Won-Chan

More information

Reducing the Effects of Careless Responses on Item Calibration in Item Response Theory

Reducing the Effects of Careless Responses on Item Calibration in Item Response Theory Reducing the Effects of Careless Responses on Item Calibration in Item Response Theory Jeffrey M. Patton, Ying Cheng, & Ke-Hai Yuan University of Notre Dame http://irtnd.wikispaces.com Qi Diao CTB/McGraw-Hill

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Does Pre-equating Work? An Investigation into Pre-equated Testlet-based College Placement Exam Using Post. Administration Data.

Does Pre-equating Work? An Investigation into Pre-equated Testlet-based College Placement Exam Using Post. Administration Data. Does Pre-equating Work? An Investigation into Pre-equated Testlet-based College Placement Exam Using Post Administration Data Wei He Michigan State University Rui Gao Chunyi Ruan ETS, Princeton, NJ Paper

More information

Multidimensional Item Response Theory (MIRT) University of Kansas Item Response Theory Stats Camp 07

Multidimensional Item Response Theory (MIRT) University of Kansas Item Response Theory Stats Camp 07 Multidimensional Item Response Theory (MIRT) University of Kansas Item Response Theory Stats Camp 07 Overview Basics of MIRT Assumptions Models Applications Why MIRT? Many of the more sophisticated approaches

More information

Enhanced Web Log Based Recommendation by Personalized Retrieval

Enhanced Web Log Based Recommendation by Personalized Retrieval Enhanced Web Log Based Recommendation by Personalized Retrieval Xueping Peng FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY UNIVERSITY OF TECHNOLOGY, SYDNEY A thesis submitted for the degree of Doctor

More information

User s Manual: WinGen2

User s Manual: WinGen2 2007 User s Manual: WinGen2 User s Manual: WinGen 1 Header: User s Manual: WinGen User s Manual for WinGen: Windows Software that Generates IRT Model Parameters and Item Responses 1 Kyung T. Han and Ronald

More information

Package DFIT. February 19, 2015

Package DFIT. February 19, 2015 Title Differential Functioning of Items and Tests Version 1.0-2 Date 2015-01-31 Package DFIT February 19, 2015 Author Victor H. Cervantes Maintainer

More information

CITY UNIVERSITY OF NEW YORK. Creating a New Project in IRBNet. i. After logging in, click Create New Project on left side of the page.

CITY UNIVERSITY OF NEW YORK. Creating a New Project in IRBNet. i. After logging in, click Create New Project on left side of the page. CITY UNIVERSITY OF NEW YORK Creating a New Project in IRBNet i. After logging in, click Create New Project on left side of the page. ii. Enter the title of the project, the principle investigator s (PI)

More information

The Discovery and Retrieval of Temporal Rules in Interval Sequence Data

The Discovery and Retrieval of Temporal Rules in Interval Sequence Data The Discovery and Retrieval of Temporal Rules in Interval Sequence Data by Edi Winarko, B.Sc., M.Sc. School of Informatics and Engineering, Faculty of Science and Engineering March 19, 2007 A thesis presented

More information

1 DATA PREPARATION BILOG-MG File menu Edit menu Setup menu Data menu...

1 DATA PREPARATION BILOG-MG File menu Edit menu Setup menu Data menu... Table of Contents 1 DATA PREPARATION...16 2 BILOG-MG...24 2.1 NEW FEATURES IN BILOG-MG...24 2.2 PHASES OF THE ANALYSIS: INPUT, CALIBRATION AND SCORING...26 2.3 THE BILOG-MG INTERFACE...37 2.3.1 File menu...

More information

The Partial Credit Model and Generalized Partial Credit Model as Constrained Nominal Response Models, With Applications in Mplus.

The Partial Credit Model and Generalized Partial Credit Model as Constrained Nominal Response Models, With Applications in Mplus. Structural Equation Modeling: A Multidisciplinary Journal Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/hsem20 The Partial Credit

More information

MASTER OF ENGINEERING PROGRAM IN INFORMATION

MASTER OF ENGINEERING PROGRAM IN INFORMATION MASTER OF ENGINEERING PROGRAM IN INFORMATION AND COMMUNICATION TECHNOLOGY FOR EMBEDDED SYSTEMS (INTERNATIONAL PROGRAM) Curriculum Title Master of Engineering in Information and Communication Technology

More information

The Item Log-Likelihood Surface for Two- and Three-Parameter Item Characteristic Curve Models

The Item Log-Likelihood Surface for Two- and Three-Parameter Item Characteristic Curve Models The Item Log-Likelihood Surface for Two- and Three-Parameter Item Characteristic Curve Models Frank B. Baker University of Wisconsin This article investigated the form of item log-likelihood surface under

More information

A class of Multidimensional Latent Class IRT models for ordinal polytomous item responses

A class of Multidimensional Latent Class IRT models for ordinal polytomous item responses A class of Multidimensional Latent Class IRT models for ordinal polytomous item responses Silvia Bacci 1, Francesco Bartolucci, Michela Gnaldi Dipartimento di Economia, Finanza e Statistica - Università

More information

Shudong Wang Gregg Harris NWEA

Shudong Wang Gregg Harris NWEA Effect of Missing Data in Computerized Adaptive Testing on Accuracy of Item Parameter Estimation: A comparison of NWEA and WINSTEPS Item Parameter Calibration Procedures Shudong Wang Gregg Harris NWEA

More information

Evaluation of the Efficiency

Evaluation of the Efficiency Evaluation of the Efficiency of Item Calibration C. David Vale and Kathleen A. Gialluca Assessment Systems Corporation This study compared several IRT calibration procedures to determine which procedure,

More information

PREREQUISITE:Individualized Educational Plan with this component. REQUIRED MATERIALS: notebook, pencil, calculator, agenda book

PREREQUISITE:Individualized Educational Plan with this component. REQUIRED MATERIALS: notebook, pencil, calculator, agenda book #936 MATH 10 SMALL GROUP GRADE: 10 LEVEL: Small Group CREDITS: 10 PREREQUISITE:Individualized Educational Plan with this component BASIC TEXT: Algebra, Prentice Hall Informal Geometry, Prentice Hall, 1992

More information

Academic Program Plan for Assessment of Student Learning Outcomes The University of New Mexico

Academic Program Plan for Assessment of Student Learning Outcomes The University of New Mexico Academic Program Plan for Assessment of Student Learning Outcomes The Mexico A. College, Department and Date 1. College: School of Engineering 2. Department: Department of Civil Engineering 3. Date: February

More information

From Bayesian Analysis of Item Response Theory Models Using SAS. Full book available for purchase here.

From Bayesian Analysis of Item Response Theory Models Using SAS. Full book available for purchase here. From Bayesian Analysis of Item Response Theory Models Using SAS. Full book available for purchase here. Contents About this Book...ix About the Authors... xiii Acknowledgments... xv Chapter 1: Item Response

More information

ABSTRACT. Professor Robert J. Mislevy Department of Measurement, Statistics & Evaluation

ABSTRACT. Professor Robert J. Mislevy Department of Measurement, Statistics & Evaluation ABSTRACT Title of Dissertation: DEVELOPING A COMMON SCALE FOR TESTLET MODEL PARAMETER ESTIMATES UNDER THE COMMON- ITEM NONEQUIVALENT GROUPS DESIGN Dongyang Li, Doctor of Philosophy, 2009 Directed By: Professor

More information

A Frequent Max Substring Technique for. Thai Text Indexing. School of Information Technology. Todsanai Chumwatana

A Frequent Max Substring Technique for. Thai Text Indexing. School of Information Technology. Todsanai Chumwatana School of Information Technology A Frequent Max Substring Technique for Thai Text Indexing Todsanai Chumwatana This thesis is presented for the Degree of Doctor of Philosophy of Murdoch University May

More information

Smoothing and Equating Methods Applied to Different Types of Test Score Distributions and Evaluated With Respect to Multiple Equating Criteria

Smoothing and Equating Methods Applied to Different Types of Test Score Distributions and Evaluated With Respect to Multiple Equating Criteria Research Report ETS RR 11-20 Smoothing and Equating Methods Applied to Different Types of Test Score Distributions and Evaluated With Respect to Multiple Equating Criteria Tim Moses Jinghua Liu April 2011

More information

Resource Allocation Strategies for Multiple Job Classes

Resource Allocation Strategies for Multiple Job Classes Resource Allocation Strategies for Multiple Job Classes by Ye Hu A thesis presented to the University of Waterloo in fulfillment of the thesis requirement for the degree of Master of Mathematics in Computer

More information

Mathematics Scope & Sequence Grade 8 Revised: June 2015

Mathematics Scope & Sequence Grade 8 Revised: June 2015 Mathematics Scope & Sequence 2015-16 Grade 8 Revised: June 2015 Readiness Standard(s) First Six Weeks (29 ) 8.2D Order a set of real numbers arising from mathematical and real-world contexts Convert between

More information

IRTEQ: Program Operation Manual. Kyung (Chris) T. Han. Table of Contents

IRTEQ: Program Operation Manual. Kyung (Chris) T. Han. Table of Contents IRTEQ: Program Operation Manual Kyung (Chris) T. Han Table of Contents I. IRTEQ Operation with GUI 2 II. IRTEQ with Syntax and Cue Files 8 Appendix. Loss Functions for TCC Scaling Methods in IRTEQ 10 1

More information

PROGRAMME SPECIFICATION POSTGRADUATE PROGRAMMES. Programme name Professional Engineering (Civil Engineering)

PROGRAMME SPECIFICATION POSTGRADUATE PROGRAMMES. Programme name Professional Engineering (Civil Engineering) PROGRAMME SPECIFICATION POSTGRADUATE PROGRAMMES KEY FACTS Programme name Professional Engineering (Civil Engineering) Award MSc School School of Engineering and Mathematical Sciences Department or equivalent

More information

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES

MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES UNIVERSITY OF GLASGOW MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES by KHUNESWARI GOPAL PILLAY A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in

More information

A logistic function of a monotonic polynomial for estimating item response functions

A logistic function of a monotonic polynomial for estimating item response functions A logistic function of a monotonic polynomial for estimating item response functions Carl F. Falk and Li Cai University of California, Los Angeles IMPS 2013 Carl F. Falk & Li Cai (UCLA) Mono Poly 1 IMPS

More information

CONCENTRATIONS: HIGH-PERFORMANCE COMPUTING & BIOINFORMATICS CYBER-SECURITY & NETWORKING

CONCENTRATIONS: HIGH-PERFORMANCE COMPUTING & BIOINFORMATICS CYBER-SECURITY & NETWORKING MAJOR: DEGREE: COMPUTER SCIENCE MASTER OF SCIENCE (M.S.) CONCENTRATIONS: HIGH-PERFORMANCE COMPUTING & BIOINFORMATICS CYBER-SECURITY & NETWORKING The Department of Computer Science offers a Master of Science

More information

Statistical Methods for the Analysis of Repeated Measurements

Statistical Methods for the Analysis of Repeated Measurements Charles S. Davis Statistical Methods for the Analysis of Repeated Measurements With 20 Illustrations #j Springer Contents Preface List of Tables List of Figures v xv xxiii 1 Introduction 1 1.1 Repeated

More information

ENGINEERING AND TECHNOLOGY MANAGEMENT

ENGINEERING AND TECHNOLOGY MANAGEMENT Engineering and Technology Management 1 ENGINEERING AND TECHNOLOGY MANAGEMENT Master of Science in Engineering Technology Management Tim Hardin, PhD Director Brenda L. Johnson, MS Assistant Director OSU

More information

How Is the CPA Exam Scored? Prepared by the American Institute of Certified Public Accountants

How Is the CPA Exam Scored? Prepared by the American Institute of Certified Public Accountants How Is the CPA Exam Scored? Prepared by the American Institute of Certified Public Accountants Questions pertaining to this decision paper should be directed to Carie Chester, Office Administrator, Exams

More information

Knowledge libraries and information space

Knowledge libraries and information space University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2009 Knowledge libraries and information space Eric Rayner University

More information

An Experiment in Visual Clustering Using Star Glyph Displays

An Experiment in Visual Clustering Using Star Glyph Displays An Experiment in Visual Clustering Using Star Glyph Displays by Hanna Kazhamiaka A Research Paper presented to the University of Waterloo in partial fulfillment of the requirements for the degree of Master

More information

GUIDELINES FOR MASTER OF SCIENCE INTERNSHIP THESIS

GUIDELINES FOR MASTER OF SCIENCE INTERNSHIP THESIS GUIDELINES FOR MASTER OF SCIENCE INTERNSHIP THESIS Dear Participant of the MScIS Program, If you have chosen to follow an internship, one of the requirements is to write a Thesis. This document gives you

More information

Student retention in distance education using on-line communication.

Student retention in distance education using on-line communication. Doctor of Philosophy (Education) Student retention in distance education using on-line communication. Kylie Twyford AAPI BBus BEd (Hons) 2007 Certificate of Originality I certify that the work in this

More information

Integrated Algebra 2 and Trigonometry. Quarter 1

Integrated Algebra 2 and Trigonometry. Quarter 1 Quarter 1 I: Functions: Composition I.1 (A.42) Composition of linear functions f(g(x)). f(x) + g(x). I.2 (A.42) Composition of linear and quadratic functions II: Functions: Quadratic II.1 Parabola The

More information

CITY UNIVERSITY OF NEW YORK. i. Visit:

CITY UNIVERSITY OF NEW YORK. i. Visit: CITY UNIVERSITY OF NEW YORK I. ACCESSING IRB NET (New Registration) i. Visit: https://www.irbnet.org/release/index.html ii. New users: Click on New Registration in the top right corner iii. Fill-out the

More information

ASSIUT UNIVERSITY. Faculty of Computers and Information Department of Information Systems. IS Ph.D. Program. Page 0

ASSIUT UNIVERSITY. Faculty of Computers and Information Department of Information Systems. IS Ph.D. Program. Page 0 ASSIUT UNIVERSITY Faculty of Computers and Information Department of Information Systems Informatiio on Systems PhD Program IS Ph.D. Program Page 0 Assiut University Faculty of Computers & Informationn

More information

Implications of Post-NCSC Project Scenarios for Future Test Development

Implications of Post-NCSC Project Scenarios for Future Test Development Implications of Post-NCSC Project Scenarios for Future Test Development Brian Gong Center for Assessment All rights reserved. Any or all portions of this document may be used to support additional study

More information

Curriculum Outcome Assesment using Subject Matter on the FE Examination.

Curriculum Outcome Assesment using Subject Matter on the FE Examination. Session : Curriculum Outcome Assesment using Subject Matter on the FE Examination. Enno Ed Koehn, Ramakanth Mandalika Lamar University Abstract: In engineering education, assessment has become a major

More information

CRITERIA FOR ACCREDITING COMPUTING PROGRAMS

CRITERIA FOR ACCREDITING COMPUTING PROGRAMS CRITERIA FOR ACCREDITING COMPUTING PROGRAMS Effective for Reviews During the 2014-2015 Accreditation Cycle Incorporates all changes approved by the ABET Board of Directors as of October 26, 2013 Computing

More information

Traffic Analysis on Business-to-Business Websites. Masterarbeit

Traffic Analysis on Business-to-Business Websites. Masterarbeit Traffic Analysis on Business-to-Business Websites Masterarbeit zur Erlangung des akademischen Grades Master of Science (M. Sc.) im Studiengang Wirtschaftswissenschaft der Wirtschaftswissenschaftlichen

More information

3. CENTRAL TENDENCY MEASURES AND OTHER CLASSICAL ITEM ANALYSES OF THE 2011 MOD-MSA: MATHEMATICS

3. CENTRAL TENDENCY MEASURES AND OTHER CLASSICAL ITEM ANALYSES OF THE 2011 MOD-MSA: MATHEMATICS 3. CENTRAL TENDENCY MEASURES AND OTHER CLASSICAL ITEM ANALYSES OF THE 2011 MOD-MSA: MATHEMATICS This section provides central tendency statistics and results of classical statistical item analyses for

More information

ASSIUT UNIVERSITY. Faculty of Computers and Information Department of Information Technology. on Technology. IT PH.D. Program.

ASSIUT UNIVERSITY. Faculty of Computers and Information Department of Information Technology. on Technology. IT PH.D. Program. ASSIUT UNIVERSITY Faculty of Computers and Information Department of Information Technology Informatiio on Technology PhD Program IT PH.D. Program Page 0 Assiut University Faculty of Computers & Informationn

More information

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN

More information

Requirements for Forensic Photography & Imaging Certification (08/2017)

Requirements for Forensic Photography & Imaging Certification (08/2017) Requirements for Forensic Photography & Imaging Certification (08/2017) A. General Requirements 1. An applicant for certification must possess a high ethical and professional standing. 2. All applicants

More information

GEO BASED ROUTING FOR BORDER GATEWAY PROTOCOL IN ISP MULTI-HOMING ENVIRONMENT

GEO BASED ROUTING FOR BORDER GATEWAY PROTOCOL IN ISP MULTI-HOMING ENVIRONMENT GEO BASED ROUTING FOR BORDER GATEWAY PROTOCOL IN ISP MULTI-HOMING ENVIRONMENT Duleep Thilakarathne (118473A) Degree of Master of Science Department of Electronic and Telecommunication Engineering University

More information

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE (NON-THESIS OPTION)

MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE (NON-THESIS OPTION) Master of Science (M.S.) Major in Computer Science (Non-thesis Option) 1 MASTER OF SCIENCE (M.S.) MAJOR IN COMPUTER SCIENCE (NON-THESIS OPTION) Major Program The Master of Science (M.S.) degree with a

More information

ANNUAL PROGRAM REPORT. Multiple and Single Subject Credential Programs. Credential programs are not subject to 5 year reviews

ANNUAL PROGRAM REPORT. Multiple and Single Subject Credential Programs. Credential programs are not subject to 5 year reviews ACADEMIC SENATE Committee on Academic Planning and Review ANNUAL PROGRAM REPORT College Department Program CEAS Teacher Education Multiple and Single Subject Credential Programs Reporting for Academic

More information

A Beginner's Guide to. Randall E. Schumacker. The University of Alabama. Richard G. Lomax. The Ohio State University. Routledge

A Beginner's Guide to. Randall E. Schumacker. The University of Alabama. Richard G. Lomax. The Ohio State University. Routledge A Beginner's Guide to Randall E. Schumacker The University of Alabama Richard G. Lomax The Ohio State University Routledge Taylor & Francis Group New York London About the Authors Preface xv xvii 1 Introduction

More information

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms

More information

Automated Item Banking and Test Development Model used at the SSAC.

Automated Item Banking and Test Development Model used at the SSAC. Automated Item Banking and Test Development Model used at the SSAC. Tural Mustafayev The State Student Admission Commission of the Azerbaijan Republic Item Bank Department Item Banking For many years tests

More information

Analysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS

Analysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS Analysis of Panel Data Third Edition Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS Contents Preface to the ThirdEdition Preface to the Second Edition Preface to the First Edition

More information

PECT Test Development Process and Test Preparation. May 2013

PECT Test Development Process and Test Preparation. May 2013 PECT Test Development Process and Test Preparation May 2013 Program Background In May 2007, the Pennsylvania State Board of Education approved an amended version of Chapter 49-2. The amended regulations:

More information

PhD Candidacy Exam Overview

PhD Candidacy Exam Overview EDIC - Doctoral Program in Computer & Communication Sciences PhD Candidacy Exam Overview https://phd.epfl.ch/edic/candidacyexams Candidacy exam background The philosophy After your 1 st year of PhD you

More information

Buros Center for Testing. Standards for Accreditation of Testing Programs

Buros Center for Testing. Standards for Accreditation of Testing Programs Buros Center for Testing Standards for Accreditation of Testing Programs Improving the Science and Practice of Testing www.buros.org Copyright 2017 The Board of Regents of the University of Nebraska and

More information

The Structure and Properties of Clique Graphs of Regular Graphs

The Structure and Properties of Clique Graphs of Regular Graphs The University of Southern Mississippi The Aquila Digital Community Master's Theses 1-014 The Structure and Properties of Clique Graphs of Regular Graphs Jan Burmeister University of Southern Mississippi

More information

Building Better Parametric Cost Models

Building Better Parametric Cost Models Building Better Parametric Cost Models Based on the PMI PMBOK Guide Fourth Edition 37 IPDI has been reviewed and approved as a provider of project management training by the Project Management Institute

More information

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York

Clustering. Robert M. Haralick. Computer Science, Graduate Center City University of New York Clustering Robert M. Haralick Computer Science, Graduate Center City University of New York Outline K-means 1 K-means 2 3 4 5 Clustering K-means The purpose of clustering is to determine the similarity

More information

Business Intelligence Roadmap HDT923 Three Days

Business Intelligence Roadmap HDT923 Three Days Three Days Prerequisites Students should have experience with any relational database management system as well as experience with data warehouses and star schemas. It would be helpful if students are

More information

6th Grade Advanced Math Algebra

6th Grade Advanced Math Algebra 6th Grade Advanced Math Algebra If your student is considering a jump from 6th Advanced Math to Algebra, please be advised of the following gaps in instruction. 19 of the 7th grade mathematics TEKS and

More information

EXAM PREPARATION GUIDE

EXAM PREPARATION GUIDE When Recognition Matters EXAM PREPARATION GUIDE PECB Certified ISO 22000 Lead Implementer www.pecb.com The objective of the Certified ISO 22000 Lead Implementer examination is to ensure that the candidate

More information

About the course.

About the course. 1 About the course www.sheffield.ac.uk/is Skills relevant to your career Our MSc in Information Systems provides you with the practical knowledge you need in the fastgrowing field of information systems.

More information

On the Near-Optimality of List Scheduling Heuristics for Local and Global Instruction Scheduling

On the Near-Optimality of List Scheduling Heuristics for Local and Global Instruction Scheduling On the Near-Optimality of List Scheduling Heuristics for Local and Global Instruction Scheduling by John Michael Chase A thesis presented to the University of Waterloo in fulfillment of the thesis requirement

More information

Missing Data Techniques

Missing Data Techniques Missing Data Techniques Paul Philippe Pare Department of Sociology, UWO Centre for Population, Aging, and Health, UWO London Criminometrics (www.crimino.biz) 1 Introduction Missing data is a common problem

More information

Relationships and Properties of Polytomous Item Response Theory Models

Relationships and Properties of Polytomous Item Response Theory Models Relationships and Properties of Polytomous Item Response Theory Models L. Andries van der Ark Tilburg University Relationships among twenty polytomous item response theory (IRT) models (parametric and

More information

UNIVERSITI TEKNOLOGI MARA A PROCEDURAL FRAMEWORK FOR EXTENSION OF TIME (EOT) CLAIM SETTLEMENT IN THE MALAYSIAN CONSTRUCTION INDUSTRY

UNIVERSITI TEKNOLOGI MARA A PROCEDURAL FRAMEWORK FOR EXTENSION OF TIME (EOT) CLAIM SETTLEMENT IN THE MALAYSIAN CONSTRUCTION INDUSTRY UNIVERSITI TEKNOLOGI MARA A PROCEDURAL FRAMEWORK FOR EXTENSION OF TIME (EOT) CLAIM SETTLEMENT IN THE MALAYSIAN CONSTRUCTION INDUSTRY NORAZIAN MOHAMAD YUSUWAN Thesis submitted in fulfilment of the requirement

More information

Differential Item Functioning Analyses with STDIF: User s Guide April L. Zenisky, Frédéric Robin, and Ronald K. Hambleton [Version 6/15/2009]

Differential Item Functioning Analyses with STDIF: User s Guide April L. Zenisky, Frédéric Robin, and Ronald K. Hambleton [Version 6/15/2009] Differential Item Functioning Analyses with STDIF: User s Guide April L. Zenisky, Frédéric Robin, and Ronald K. Hambleton [Version 6/5/2009] Part I: Introduction to the Mechanics of SDIF and UDIF STDIF

More information

APPENDIX D: New Hampshire Recertification Law

APPENDIX D: New Hampshire Recertification Law APPENDIX D: New Hampshire Recertification Law SAU #16 Professional Development Master Plan July 2007 June 2012 ED 512 STAFF DEVELOPMENT AND RECERTIFICATION EF.07/01/05 Ed 512.01 Basic Requirement. Each

More information

Albertson AP Calculus AB AP CALCULUS AB SUMMER PACKET DUE DATE: The beginning of class on the last class day of the first week of school.

Albertson AP Calculus AB AP CALCULUS AB SUMMER PACKET DUE DATE: The beginning of class on the last class day of the first week of school. Albertson AP Calculus AB Name AP CALCULUS AB SUMMER PACKET 2017 DUE DATE: The beginning of class on the last class day of the first week of school. This assignment is to be done at you leisure during the

More information

Methods and Models for the Construction of Weakly Parallel Tests

Methods and Models for the Construction of Weakly Parallel Tests Methods and Models for the Construction of Weakly Parallel Tests Jos J. Adema University of Twente Several methods are proposed for the construction of weakly parallel tests [i.e., tests with the same

More information

You will choose to study units from one of four specialist pathways depending on the career you wish to pursue. The four pathways are:

You will choose to study units from one of four specialist pathways depending on the career you wish to pursue. The four pathways are: Qualification Title: OCR Level 3 Cambridge Technical Diploma in IT Qualification Number: 601/7101/7 Overview This qualification is designed for you if you re 16 years old or over and prefer to study IT

More information

Chong Ho Yu, Ph.D., MCSE, CNE. Paper presented at the annual meeting of the American Educational Research Association, 2001, Seattle, WA

Chong Ho Yu, Ph.D., MCSE, CNE. Paper presented at the annual meeting of the American Educational Research Association, 2001, Seattle, WA RUNNING HEAD: On-line assessment Developing Data Systems to Support the Analysis and Development of Large-Scale, On-line Assessment Chong Ho Yu, Ph.D., MCSE, CNE Paper presented at the annual meeting of

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12

CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Tool 1: Standards for Mathematical ent: Interpreting Functions CCSSM Curriculum Analysis Project Tool 1 Interpreting Functions in Grades 9-12 Name of Reviewer School/District Date Name of Curriculum Materials:

More information

TRI para escalas politômicas. Dr. Ricardo Primi Programa de Mestrado e Doutorado em Avaliação Psicológica Universidade São Francisco

TRI para escalas politômicas. Dr. Ricardo Primi Programa de Mestrado e Doutorado em Avaliação Psicológica Universidade São Francisco TRI para escalas politômicas Dr. Ricardo Primi Programa de Mestrado e Doutorado em Avaliação Psicológica Universidade São Francisco Modelos Modelo Rasch-Andrich Rating Scale Model (respostas graduais)

More information

Package birtr. October 4, 2017

Package birtr. October 4, 2017 Package birtr October 4, 2017 Title The R Package for ``The Basics of Item Response Theory Using R'' Version 1.0.0 Maintainer Seock-Ho Kim R functions for ``The Basics of Item Response

More information

Research on Industrial Security Theory

Research on Industrial Security Theory Research on Industrial Security Theory Menggang Li Research on Industrial Security Theory Menggang Li China Centre for Industrial Security Research Beijing, People s Republic of China ISBN 978-3-642-36951-3

More information

2018 MSIP5 District/Charter Transitional APR Supporting Data Report - Public RAYMORE-PECULIAR R-II (019142)

2018 MSIP5 District/Charter Transitional APR Supporting Data Report - Public RAYMORE-PECULIAR R-II (019142) 1. Academic Achievement English Language Arts Metric 2016 2017 * 2018 * Status 16.0 12.0 371.3 On Track 74.3% 381.7 72.1 70.2% 370.9 66.4 55.6% 361.3 66 Progress 12.0 0.0-3.1 Floor Prior 2 Yr Avg = 69.3

More information

2007 Annual Summary. Board of Certification (BOC) Certification Examination for Athletic Trainers. CASTLE Worldwide, Inc.

2007 Annual Summary. Board of Certification (BOC) Certification Examination for Athletic Trainers. CASTLE Worldwide, Inc. 2007 Annual Summary Board of Certification (BOC) Certification Examination for Athletic Trainers CASTLE Worldwide, Inc. April, 2008 Introduction The Board of Certification (BOC) is a nonprofit credentialing

More information

Numerical analysis and comparison of distorted fingermarks from the same source. Bruce Comber

Numerical analysis and comparison of distorted fingermarks from the same source. Bruce Comber Numerical analysis and comparison of distorted fingermarks from the same source Bruce Comber This thesis is submitted pursuant to a Master of Information Science (Research) at the University of Canberra

More information

MSE Comprehensive Exam

MSE Comprehensive Exam MSE Comprehensive Exam The MSE requires a comprehensive examination, which is quite general in nature. It is administered on the sixth Friday of the semester, consists of a written exam in the major area

More information

An Introduction to Growth Curve Analysis using Structural Equation Modeling

An Introduction to Growth Curve Analysis using Structural Equation Modeling An Introduction to Growth Curve Analysis using Structural Equation Modeling James Jaccard New York University 1 Overview Will introduce the basics of growth curve analysis (GCA) and the fundamental questions

More information

certification.setac.org Certification Contact of Environmental Risk Assessors Phone: certification.setac.

certification.setac.org Certification Contact of Environmental Risk Assessors Phone: certification.setac. certification.setac.org Certification Contact Phone: +32 2 772 72 81 Email: CRA@setac.org certification.setac.org of Environmental Risk Assessors The SETAC Europe Certification of Environmental Risk Assessors

More information

Comparison of the Nonparametric

Comparison of the Nonparametric Comparison of the Nonparametric Mokken Model and Parametric IRT Models Using Latent Class Analysis Dato N. M. de Gruijter Leiden University A nonparametric Mokken analysis of test data generally results

More information