A comparison of smoothing methods for the common item nonequivalent groups design

Size: px
Start display at page:

Download "A comparison of smoothing methods for the common item nonequivalent groups design"

Transcription

1 University of Iowa Iowa Research Online Theses and Dissertations Summer 2014 A comparison of smoothing methods for the common item nonequivalent groups design Han Yi Kim University of Iowa Copyright 2014 Han Yi Kim This dissertation is available at Iowa Research Online: Recommended Citation Kim, Han Yi. "A comparison of smoothing methods for the common item nonequivalent groups design." PhD (Doctor of Philosophy) thesis, University of Iowa, Follow this and additional works at: Part of the Educational Psychology Commons

2 A COMPARISON OF SMOOTHING METHODS FOR THE COMMON ITEM NONEQUIVALENT GROUPS DESIGN by Han Yi Kim A thesis submitted in partial fulfillment of the requirements for the Doctor of Philosophy degree in Psychological and Quantitative Foundations (Educational Measurement and Statistics) in the Graduate College of The University of Iowa August 2014 Thesis Supervisors: Professor Walter P. Vispoel Associate Professor Won-Chan Lee

3 Copyright by HAN YI KIM 2014 All Rights Reserved

4 Graduate College The University of Iowa Iowa City, Iowa CERTIFICATE OF APPROVAL PH.D. THESIS This is to certify that the Ph.D. thesis of Han Yi Kim has been approved by the Examining Committee for the thesis requirement for the Doctor of Philosophy degree in Psychological and Quantitative Foundations (Educational Measurement and Statistics) at the August 2014 graduation. Thesis Committee: Walter P. Vispoel, Thesis Supervisor Won-Chan Lee, Thesis Supervisor Robert L. Brennan Mary Kathryn Cowles Catherine J. Welch

5 ACKNOWLEDGMENTS Looking back, once again I realize that I am surrounded by some great people, the people that I love and trust. The whole process of writing this dissertation and completing the graduate program could not have been done without the enormous support I got from them. I would like to extend my sincere gratitude to my committee chairs, Dr. Walter Vispoel and Dr. Won-Chan Lee. Dr. Walter Vispoel has been a wonderful mentor, advisor, and supervisor for the six years I have spent in Iowa. He has gently guided me into the world of academia, and he has been a consistent source of moral support throughout the years. I cannot thank him enough for his efforts and consideration, especially as a dissertation committee co-chair. If this dissertation succeeds in communicating what I have discovered to other researchers and practitioners, Dr. Vispoel deserves some credit for this clarity. Dr. Won-Chan Lee also deserves my deepest appreciation for pushing me to think more deeply about the issues dealt with in this dissertation. He showed endless patience with me and strong belief in my project, so that I could move forward even when I felt down and lost. I also want to thank Dr. Robert Brennan, Dr. Catherine Welch, and Dr. Mary Kathryn Cowles for serving as my committee members. Dr. Robert Brennan has provided numerous constructive feedbacks on numerous occasions to better the arguments made in this dissertation, and he has supported me through the process of searching for my first job. Dr. Catherine Welch has helped me think about the issue of smoothing with the practitioner s perspective, and also lifted my spirit up with her smile every time we bumped into each other. Dr. Mary Kathryn Cowles has given me the greatest tool, R, to make this dissertation happen. I would also like to express my gratitude to my friends, especially the ones who have always been there for me from the beginning of this journey Eunjung Lee (Miss ii

6 you!), Ja Young Kim, Hyung Jin Kim, Ah Young Shin, and Jee Hyang Lee. I am also thankful to the HPY members and birthday party members for the precious feedback on my research ideas and the moral support. I would also like to thank the people who were concerned for my wellbeing throughout the writing process. My family deserves a significant amount of the credit for everything I have accomplished. My parents, Jae-Woong Kim and Hye-Young Jung, have always been supporting me with their prayers and they have unconditionally loved me for almost thirty years. Jessica Kooksoon Kim, my younger sister, also deserves my full appreciation. She has been there for me during my ups and downs for the last six years we lived together here in Iowa. I also want to thank my four-legged best friend, Sshong, for being on my side when it feels like no one else is. Last, but not least, I thank God for all he has given. He has given me strength and courage to overcome the challenges I have met, and the people I can turn to when I get lost. You have searched me, Lord, and you know me. You know when I sit and when I rise; you perceive my thoughts from afar. You discern my going out and my lying down; you are familiar with all my ways (Psalm 139:1-3, New International Version). iii

7 ABSTRACT The purpose of this study was to compare the relative performance of various smoothing methods under the common item nonequivalent groups (CINEG) design. In light of the previous literature on smoothing under the CINEG design, this study was designed to provide general guidelines and practical insights into the selection of smoothing procedures under specific testing conditions. To investigate the smoothing procedures, 100 replications were simulated under various testing conditions by using an item response theory (IRT) framework. A total of 192 conditions were investigated (3 sample sizes 4 group ability differences 2 common-item proportions 2 form difficulty differences 1 test length 2 commonitem types 2 spreads of common-item difficulty). Two smoothing methods including log-linear presmoothing and cubic spline postsmoothing were considered with four equating methods including frequency estimation (FE), modified frequency estimation (MFE), chained equipercentile equating (CE), and kernel equating (KE). Bias, standard error, and root mean square error were computed to evaluate the performance of the smoothing methods. Results showed that: (1) there was always one or more smoothing methods that produced smaller total error than unsmoothed methods; (2) polynomial log-linear presmoothing tended to perform better than cubic spline postsmoothing in terms of systematic and total errors when FE or MFE were used; (3) cubic spline postsmoothing showed a strong tendency to produce the least amount of random error regardless of the equating method used; (4) KE produced more accurate equating relationships under a majority of testing conditions when paired with CE; and (5) log-linear presmoothing produced smaller total error under more testing conditions than did cubic spline postsmoothing. Tables are provided to show the best-performing smoothing method for all combinations of testing conditions considered. iv

8 TABLE OF CONTENTS LIST OF TABLES... vii LIST OF FIGURES... ix CHAPTER ONE INTRODUCTION...1 Equating Designs, Methods, and Assumptions...2 Smoothing...4 Purposes and Study Design...6 Research Questions...6 CHAPTER TWO LITERATURE REVIEW...8 Equating Design...8 Equating Method...9 Equipercentile Equating...9 Frequency Estimation (FE) Equipercentile Equating...10 Modified Frequency Estimation (MFE) Equating...12 Chained Equipercentile (CE) Equating...13 Kernel Equating (KE)...14 Smoothing Method...17 Polynomial Log-Linear Presmoothing...17 Strong True Score Presmoothing...19 Cubic Spline Postsmoothing...21 Review of Relevant Literature...24 Factors Affecting Equating Under the CINEG Design...24 Reduction in Equating Error by Smoothing...27 Factors Affecting Smoothing in Test Equating...29 Comparing the Performance of Smoothing Methods...32 Summary of the Literature Review...42 CHAPTER THREE METHODOLOGY...46 Study Design...46 Factors Considered...46 Equating Methods Studied...50 Computer Software...51 Simulation...52 Form Construction...52 Criterion Equating Relationship...54 Procedure...56 Evaluation Criteria...57 CHAPTER FOUR RESULTS...60 Sample Size...61 Frequency Estimation Equipercentile Equating...62 Modified Frequency Estimation Equipercentile Equating...62 Chained Equipercentile Equating...63 Summary...63 v

9 Group Mean Difference...64 Frequency Estimation Equipercentile Equating...66 Modified Frequency Estimation Equipercentile Equating...66 Chained Equipercentile Equating...67 Summary...68 Proportion of Common Items...68 Frequency Estimation Equipercentile Equating...69 Modified Frequency Estimation Equipercentile Equating...69 Chained Equipercentile Equating...70 Summary...71 Form Difference...71 Frequency Estimation Equipercentile Equating...72 Modified Frequency Estimation Equipercentile Equating...72 Chained Equipercentile Equating...73 Summary...73 Common Item Type...74 Frequency Estimation Equipercentile Equating...75 Modified Frequency Estimation Equipercentile Equating...75 Chained Equipercentile Equating...76 Summary...76 Spread of Difficulty Parameters...77 Frequency Estimation Equipercentile Equating...77 Modified Frequency Estimation Equipercentile Equating...78 Chained Equipercentile Equating...78 Summary...78 Smoothing Parameter...79 Equating Method...80 Further Explorations into Equating Error...81 Absolute Difference in Equating Error...82 Equating Error Conditional on Score Points...84 Chapter Summary...87 CHAPTER FIVE DISCUSSION AND CONCLUSION...89 Research Question Research Question Research Question Research Question Limitations and Future Research Conclusions and Implications APPENDIX A. TABLES APPENDEX B. FIGURES REFERENCES vi

10 LIST OF TABLES Table A1. Factors Controlled in the Simulation Table A2. Previous Comparison Studies Conducted Under the CINEG Design Table A3. Mean and Standard Deviation of Item Parameters Table A4. Descriptive Statistics of Item Difficulty ( b ) Parameters Table A5. Aggregated Weighted Absolute Bias for Different Sample Sizes Table A6. Aggregated Weighted Standard Error for Different Sample Sizes Table A7. Aggregated Weighted Root Mean Square Error for Different Sample Sizes Table A8. Aggregated Weighted Absolute Bias for Different Group Differences Table A9. Aggregated Weighted Standard Error for Different Group Differences Table A10. Aggregated Weighted Root Mean Square Error for Different Group Differences Table A11. Aggregated Weighted Absolute Bias for Different Proportions of Common Items Table A12. Aggregated Weighted Standard Error for Different Proportions of Common Items Table A13. Aggregated Weighted Root Mean Square Error for Different Proportions of Common Items Table A14. Aggregated Weighted Absolute Bias for Different Form Differences Table A15. Aggregated Weighted Standard Error for Different Form Differences Table A16. Aggregated Weighted Root Mean Square Error for Different Form Differences Table A17. Aggregated Weighted Absolute Bias for Different Common Item Types Table A18. Aggregated Weighted Standard Error for Different Common Item Types Table A19. Aggregated Weighted Root Mean Square Error for Different Common Item Types Table A20. Aggregated Weighted Absolute Bias for Different Spread of Difficulty Parameters in the Common Items vii

11 Table A21. Aggregated Weighted Standard Error for Different Spread of Difficulty Parameters in the Common Items Table A22. Aggregated Weighted Root Mean Square Error for Different Spread of Difficulty Parameters in the Common Items Table A23. Aggregated Weighted Absolute Bias, Weighted Standard Error, and Weighted Root Mean Square Error for Different Smoothing Parameters Table A24. Aggregated Weighted Absolute Bias, Weighted Standard Error, and Weighted Root Mean Square Error for Different Equating Methods viii

12 LIST OF FIGURES Figure B1. Smoothing Methods Producing the Smallest WRMSE for Different Sample Sizes Figure B2. Smoothing Methods Producing the Smallest WAB for Different Group Differences Figure B3. Smoothing Methods Producing the Smallest WSE for Different Group Differences Figure B4. Smoothing Methods Producing the Smallest WRMSE for Different Group Differences Figure B5. Smoothing Methods Producing the Smallest WRMSE for Different Proportion of Common Items Figure B6. Smoothing Methods Producing the Smallest WRMSE for Different Form Differences Figure B7. Smoothing Methods Producing the Smallest WRMSE for Different Common Item Types Figure B8. Smoothing Methods Producing the Smallest WRMSE for Different Spread of Difficulty Parameters in the Common-Item Sets Figure B9. Differences in WAB, WSE, and WRMSE for Various Smoothing Methods under 192 Conditions Figure B10. Bias, standard error, and root mean square error for frequency estimation, modified frequency estimation, and chained equipercentile equating when N 2,000, group difference =.2, form difference =.05, 20% mini/internal common items are used Figure B11. Bias, standard error, and root mean square error for frequency estimation, modified frequency estimation, and chained equipercentile equating when N 2,000, group difference =.2, form difference =.05, 20% midi/internal common items are used Figure B12. A Demonstration of Internal versus External Common Items ix

13 1 CHAPTER ONE INTRODUCTION Test security is an increasingly serious issue in the current Internet era. If someone decides to memorize and upload a set of items to a specific forum, thousands of people can have access to operational items by a single click on their own personal computers. Even with conventional paper-based assessments, security issues are concerns because examinees often do not take the same test at the same time. To keep tests secure, alternate forms are developed in which different collections of test items are written according to the same set of content and statistical specifications (Kolen & Brennan, 2004). Alternate forms are administered to examinees who take the test on different dates. However, despite the best efforts of test developers, forms usually vary in item difficulties, raising issues of test score interchangeability. That is, two examinees having identical scores on two different forms may not have the same level of proficiency on the construct being measured. One form might be oddly easy or difficult than the other form leading to issues of fairness. As a result, unintended differences in test form difficulty have to be eliminated so that examinees are treated fairly (Holland & Dorans, 2006). The statistical procedure, equating, is used to adjust for differences in form difficulty so that scores on alternate forms can be used interchangeably (Kolen & Brennan, 2004). Typically, the scale for the old form (Form Y) is established and equating preserves that score scale for the new form (Form X). Hence, scores on Form X are typically equated to scores on Form Y. In this chapter, an overview of equating designs, methods, and assumptions for this dissertation is provided. Then, smoothing procedures are described in general terms along with the purpose of conducting smoothing in the equating process. Finally, the

14 2 purposes and study design implemented in this dissertation are examined followed by the research questions addressed. Equating Designs, Methods, and Assumptions Before equating is implemented, a data collection design should be selected. Commonly used designs include random groups, single group, and common-item nonequivalent groups (CINEG; also called as a nonequivalent groups with anchor test [NEAT]) design (Kolen & Brennan, 2004). In the random groups design, randomly equivalent groups take different forms. Since the groups are assumed to be randomly equivalent, differences in scores are attributed solely to form differences. In the singlegroup design, the same examinees take both the old and new forms. Since both forms are administered to a single group, the differences in scores can also be ascribed to form differences as long as order effects are controlled. In the CINEG design, which is the focus of this dissertation, two nonequivalent groups or naturally occurring groups take different test forms on different dates. Differences in test scores can be the result of a combination of form differences and group differences. To disentangle form and group effects, a set of common items proportionally representing the total test forms in terms of content and statistical characteristics are embedded within the two alternate forms (Livingston, 2004). Furthermore, because common items are assumed to behave similarly in different test forms, the location of those items and wording and ordering of answer alternatives should be identical in the old and new forms. Generally, the common-item set is constructed as a mini-version of the total test form to precisely account for group differences. To ensure that the common items statistically represent the full forms, very easy or very difficult items have to be included in the common-item set. However, because those items are in short supply, groups of researchers have suggested relaxing this requirement by using midi common items (see, e.g., Sinharay & Holland, 2006a, 2006b, 2007). A more

15 3 detailed description on mini versus midi common-item sets can be found in Chapter 2 of this dissertation. Equating methods should also be taken into consideration. Under the CINEG design, linear equating methods such as Tucker method (Gulliksen, 1950), Levine observed score method (Levine, 1955), and Levine true score method (Levine, 1955) can be used. However, only equipercentile methods were considered in this dissertation. Since form-to-form differences in difficulty are defined as a curve in equipercentile equating methods, they are more flexible than linear methods. However, an issue remains. The functions used in the equipercentile equating method assume continuous variables, but usually test scores are discrete (e.g., only integer values are possible). Therefore, various continuizing methods are adopted including linear interpolation and kernel methods. Traditional equipercentile equating involves continuizing and assumes that scores are uniformly distributed at the range of.5 for a certain integer score, whereas the kernel method presumes a Gaussian kernel. The kernel method will be discussed in more detail shortly. Within the CINEG design, two different methods of equipercentile equating can be implemented: frequency estimation equipercentile equating (FE) and chained equipercentile equating (CE). Synthetic population weights should be determined when using the FE method. Additionally, the conditional distribution of total scores given each common-item score, V v, is assumed to be the same in both populations for both Form X and Form Y. Because of the assumption made, the FE method should be carried out only when the two populations are very similar (Kolen & Brennan, 2004). The CE method, in contrast, does not make any assumptions about the similarity of populations. As the label implies, a chain of two equipercentile equatings is entailed with this method. Computationally, CE is much less intensive than FE because the joint distribution of total score and common-item score is not considered. However, CE equates two tests (total test and common-item set) having fairly unequal lengths, which is odd because the scores

16 4 are not likely to be comparable. Furthermore, the population in which the resulting equating relationship holds is uncertain, since the concept of synthetic population is not directly encompassed in the process of CE (see Chapter 5 of Kolen and Brennan (2004) or Chapter 2 of this dissertation for detailed information on each equating method). Additionally, Wang and Brennan (2006, 2009) proposed the modified frequency estimation (MFE) method, which altered one of the traditional assumptions of the FE method in correcting for equating bias. In a simulation study, Wang and Brennan (2006, 2009) found that MFE resulted in better performance than the FE method. Furthermore, it performed better than the CE method under most conditions considered in their study. A different approach to observed-score equating, namely, kernel equating (KE) was introduced by Holland and Thayer (1989) and von Davier, Holland, and Thayer (2004). Kernel equating involves five basic steps: (1) presmoothing the score probabilities, (2) estimating the score probabilities, (3) continuizing the score probabilities, (4) computing the equating relationship, and (5) calculating the standard error of equating (von Davier et al., 2004). Unlike other smoothing methods, the presmoothing process is embedded within the equating procedure. Each equating method is dealt with in further detail in Chapter 2. Smoothing With every equating procedure, two types of error are present. One is systematic error, which is bias, and the other is random error captured by the standard error (SE) of equating (SE). Focusing on a specific score on Form X, x i, the total equating error can be expressed as follows: [ eˆ ( x ) e ( x )], (1.1) Y i Y i where e ( x ) is defined as population equipercentile equivalent score, and eˆ ( x ) is Y i defined as the sample estimate where the expected value over replications of eˆ ( x ) is Y Y i i

17 5 assumed to equal e ( x ) Y i, i.e., E [ eˆ ( x )] e ( x ), where E Y i Y i is the expectation operator. Then, the SE of equating can be defined as: SE[ eˆ ( x )] [ e ( x ) e ( x )] 2 ˆ Y i E Y i. (1.2) Y i The SE of equating is conceptualized as the standard deviation of score equivalents over replications of the equating procedure (Kolen & Brennan, 2004, p. 23). Random error present in the equating process can be reduced by using large sample sizes and/or an equating design that reduces such error. Also, smoothing can be implemented to reduce random error. By defining equipercentile equivalent score and random error and systematic error: tˆ ( x ) Y i as an alternative estimator of the population t ( x ) E [ tˆ ( x )], total error can be separated into Y i Y i tˆ ( x ) e ( x ) [ tˆ ( x ) t ( x )] [ t ( x ) e ( x )], (1.3) Y i Y i Y i Y i Y i Y i where the first part of the equation quantifies random error and the rest quantifies systematic error. When tˆ ( x ) is defined as an estimator resulting from an equating procedure with smoothing, Y i E [ tˆ ( x )] does not equal e ( x ), which indicates that Y i smoothing introduces systematic error. The purpose of using smoothing in equating is to reduce the random error presented in equating procedures without introducing too much systematic error (Kolen & Brennan, 2004; Liu & Kolen, 2011). Generally, two different smoothing methods have been examined to improve the equating procedure: presmoothing and postsmoothing (Kolen & Brennan, 2004). In presmoothing, the score distributions are smoothed before equating is conducted, whereas in postsmoothing, equipercentile equivalents (i.e., the products of equating) are smoothed. The most frequently used methods in the literature are the polynomial loglinear, strong true score presmoothing, and cubic spline postsmoothing methods (Liu & Kolen, 2011). Since strong true score presmoothing methods such as the beta4 method are not flexible enough to be used in general equating situations (Colton, 1995; Kolen, 1991), they were not considered in this dissertation. Y i

18 6 Purposes and Study Design Although researchers and major testing companies frequently use various smoothing methods, few empirical investigations have been conducted to compare their performances. This is especially so for the common item nonequivalent groups (CINEG) design (Antal, Kim, & Lee, 2011). The current dissertation will address this gap in the literature, and identify the best-performing smoothing method under various testing conditions. The factors controlled in the data simulation include the proportion of common items, form difficulty differences, types of common items (internal or external), spread of difficulties for common items, sample size, and group differences (i.e., effect size). Moreover, in the equating process, various equating methods including frequency estimation, modified frequency estimation, chained equipercentile equating, and kernel equating were considered. Then, in conjunction with the different equating methods, polynomial log-linear presmoothing and cubic spline postsmoothing were conducted with a set of varying smoothing parameters. Research Questions The main purpose of this dissertation was to compare the performance of various smoothing methods under differing testing conditions. Specific research questions included: 1. How do variation of sample size, group difference, proportion of common items, form difference, common-item type (i.e., internal or external), spread of the difficulty in common items, smoothing parameter, and equating method affect equating errors (i.e., bias, standard error, and root mean square error)? 2. How do smoothing methods within the equating methods (i.e., FE, MFE, and CE) compare in equating error when the examinees vary?

19 To what extent does the performance of smoothing methods vary depending on the sample size? 2-2. To what extent does the performance of smoothing methods vary depending on the group mean differences (i.e., effect size)? 3. How do various smoothing methods within the equating methods (i.e., FE, MFE, and CE) compare in equating error when the test forms vary? 3-1. To what extent does the performance of smoothing methods vary depending on the proportion of common items? 3-2. To what extent does the performance of smoothing methods vary depending on the form difficulty differences? 3-3. To what extent does the performance of smoothing methods vary depending on the types of common items, i.e., external or internal? 3-4. To what extent does the performance of smoothing methods vary depending on spread of the difficulty item parameters in the common-item sets? 4. How do various smoothing methods affect equating error in conjunction with different equating methods and different smoothing parameters? 4-1. To what extent does the performance of smoothing methods vary with the equating method used (i.e., FE, MFE, and CE)? 4-2. To what extent does the performance of smoothing methods vary with different smoothing parameters?

20 8 CHAPTER TWO LITERATURE REVIEW The current chapter consists of four sections. In the first section, commonly-used equating designs including the random groups, single group, and common-item nonequivalent groups (CINEG) designs are discussed. Next, equating methods investigated in this dissertation including frequency estimation equipercentile equating (FE), chained equipercentile equating (CE), modified frequency estimation equating (MFE), and kernel equating (KE) are presented in detail. Then, relevant literature is reviewed including factors affecting equating under the CINEG design, reduction in equating error by smoothing, and comparisons of performance for smoothing methods under random groups and CINEG designs. Equating Design As mentioned in the previous chapter, the most commonly used equating designs include the random groups, single group, and common-item nonequivalent groups (CINEG) designs (Kolen & Brennan, 2004). In the random groups design, groups are conceptualized as randomly equivalent. Typically, a spiraling process is used to randomly assign different forms. Each examinee responds to either Form X or Form Y in this design. Because the groups are conceived as equivalent in terms of their proficiency, the score differences can be solely attributed to the form difficulty difference. In the single group design, both Forms X and Y are administered to the same group of examinees. To control for possible differential order effects on scores, forms are usually counterbalanced. Half of the examinees receive Form X first whereas the other half receives Form Y first. Spiraling is adopted in this design to achieve comparable subgroups as well. Under the presumption of the two subgroups being equivalent, it can be concluded that only form difficulty differences are reflected in the score differences.

21 9 With the CINEG design, groups are not assumed to be comparable because no attempts are made to randomize the forms. Different groups are administered different forms containing unique items and a set of shared items (i.e., common items). The common-item set is developed to represent the full-length test form regarding both content and statistical specifications. As a result, the score differences under the CINEG design are due to the mixture of examinee proficiencies and form difficulty differences. The CINEG design is the design that is considered in this dissertation. Equating Method Various linear and equipercentile equating methods have been developed under the CINEG design. A detailed discussion of the basic concept behind equipercentile equating is provided first, followed by discussions of specific equipercentile methods including frequency estimation (FE), chained equipercentile (CE), modified frequency estimation (MFE) and kernel equating (KE). Equipercentile Equating When implementing equipercentile equating, percentile rank and percentile functions have to be defined first. The percentile rank is defined as: P x F x x x F x F x x K * * * * ( ) 100 ( 1) (.5) ( ) ( 1),.5 X.5, 0, x.5, 100, x K.5, X (2.1) where * x is an integer closest to x such that cumulative distribution function for the Form X scores, and on Form X. defined as: Given a percentile rank * * x.5 x x.5, F( x ) is the discrete K is the number of items X * P, the inverse function (i.e., percentile function) can be

22 10 where * x U P /100 F ( x 1) x ( P ) P [ P ] ( x.5), 0 P 100, U X * * * 1 * U * * * * U F ( xu) F ( xu 1) K *.5, P 100, F x refers to the smallest integer score with a cumulative percent [100 ( )] (2.2) that is greater than * P. An alternative percentile function using * x L, which is the largest integer F x score with a cumulative percent [100 ( )] that is less than * P, is given in Kolen and Brennan (2004, p. 45). If all possible score points have some examinees scoring those points (i.e., nonzero f( x) values across all score points), the two equations produce identical results. If not, x ( x x ) / 2 is conventionally used. For Form Y scores, the U L same percentile rank and percentile functions can be used along with the discrete cumulative distribution for Form Y scores, denoted as G( y) After deriving the percentile ranks, for.5 x K X.5, the Form Y equipercentile equivalent score for a particular Form X score x where e Y 1 Y ( ) ( ). can be expressed as: e x Q P x, (2.3) refers to a symmetric equating function converting Form X scores to the Form Y score scale; Q 1 is an inverse function of the percentile rank (i.e., percentile function) for Form Y scores; and P denotes the percentile rank function for Form X scores. The resulting equivalent score e ( x ) from Equation (2.3) has the same percentile rank on Y Form Y as the percentile rank of score x on Form X (Kolen & Brennan, 2004; Petersen, Cook, & Stocking, 1983). Frequency Estimation (FE) Equipercentile Equating The frequency estimation method estimates cumulative distributions of Form X and Form Y scores for a synthetic population assuming that the conditional distribution of total score given each common-item set score is identical for both Forms X and Y in both populations.

23 11 The cumulative distributions for the synthetic population can be expressed by combining the weighted distributions for each population. Synthetic population weights are used to construct the weighted combination of the distributions. Mathematically, the synthetic population distributions for Form X and Form Y can be expressed as: f ( x) w f ( x) w f ( x) (2.4) s and gs ( y) w1 g1( y) w2g 2( y). (2.5) Subscripts s, 1, and 2 refer to the synthetic population, Population 1 who took Form X, and Population 2 who took Form Y, respectively. Function f refers to the distribution of Form X, g refers to the distribution of Form Y, and w 1 and w 2 refer to the synthetic population weights, where w1 w2 1 and w1, w2 0. Because of the data collection design, f ( x ) and g ( y ) can be directly estimated. However, f ( x ) and g ( y ) are not estimable directly from the data collected. As a result, a statistical assumption such that in both populations, the conditional distribution of total scores given each common-item score, V v, is identical for both Form X and Form Y is evoked. The given assumption can be also stated as a pair of equations: and and f ( x v) f ( x v), for all v (2.6) 1 2 g ( y v) g ( y v), for all v. (2.7) 1 2 Then, the mathematical expressions for the synthetic population become: f ( x) w f ( x) w f ( x v) h ( v) (2.8) s v g s ( y) w g ( y v) h ( v) w g ( y), (2.9) v where h ( v) and 1 h ( ) 2 v indicate the marginal distribution of the common item scores for Populations 1 and 2, respectively. Using the estimated synthetic population distributions, percentile ranks for both forms can be derived. Defining P as the percentile rank s

24 12 function for Form X, Q s as the percentile rank function for Form Y, and percentile functions enable us to formulate the equipercentile function for the synthetic population as follows: 1 Ys ( ) s [ s ( )] 1 P s and 1 Q s e x Q P x. (2.10) as Modified Frequency Estimation (MFE) Equating In the modified frequency estimation method proposed by Wang and Brennan (2006, 2009), basic assumptions made for the FE equating in Equations (2.6) are modified. Instead, MFE assumes and f ( x t ) f ( x t ) (2.11) 1 v 2 1 v 2 v g ( y t ) g ( y t ). (2.12) That is, the conditional distribution of the total scores given the true score of the common-item scores are assumed to be the same in both populations for both Form X and Form Y. Then, adopting the approach suggested by Brennan and Lee (2006), Equations (2.11) and (2.12) can be replaced with and v f ( x v ) f ( x v ) (2.13) g ( y v ) g ( y v ). (2.14) One interesting fact is that v 1 does not have to be identical to v 2 although both are linked to t V, and the corresponding v 1 values for all v 2 can be computed by setting t ˆ tˆ. v1 v2 1 2( vv, ) v1 1( v) 2( v) 1( v) v2 2( v), (2.15) ( v, v ) ( v, v ) 1 1 where ( vv, ) and ( vv, ) 1 2 refer to the reliabilities of the common-item scores in Population 1 and 2, respectively. Then, similar to the FE method, the data collected for

25 13 the MFE method do not provide any means to directly estimate However, both distributions can be estimated indirectly as follows: f ( x) 2 and g ( y) 1. f ( x) ( x v ) f ( v ) (2.16) v2 and g ( y) ( y v ) f ( v ). (2.17) v1 The marginal distributions for Form X and Form Y scores then can be estimated by plugging in the estimates given in Equations (2.16) and (2.17) to Equations (2.4) and (2.5). For more detailed steps, see Wang and Brennan (2006, 2009). Then, the subsequent process is identical to the steps followed in the traditional FE method (see the section describing the FE method). The MFE method differs from FE in two major aspects (Hou, 2007). First, unlike the FE method, the reliabilities of the common-item scores in the two populations are used in the process. Second, a special step is involved in obtaining the marginal distributions for the populations administered Forms X and Y. Chained Equipercentile (CE) Equating Chained equipercentile equating (Dorans, 1990; Livingston, Dorans, & Wright, 1990) is an alternative method to FE equating. The CE method was discussed in Angoff s 1971 chapter on scales, norms, and equivalent scores, and denoted as the direct equipercentile method by Marco, Peterson, and Stewart (1983). A three-step process is involved in the CE method (Kolen & Brennan, 2004). First, the equipercentile equating relationship e ( ) 1 x is found. In this step, conversion for the Form X scores to the V common item scores for Population 1 is derived. Next, the equipercentile equating relationship e ( ) 2 v is found. Common-item scores for Population 2 are converted to the Y Form Y scale. Lastly, by using the two equating relationships derived from the previous

26 14 steps, Form Y equipercentile equivalent of the scores on Form X are found. The chaining process can be expressed as e e [ e ( x)]. (2.18) Y ( chain) Y 2 V1 Kernel Equating (KE) Kernel equating is a unified approach to test equating based on a flexible family of equipercentile-like equating functions (von Davier et al., 2004). The name Kernel equating was coined because a nonparametric density estimation method that uses a Gaussian kernel is employed. As described in Chapter 1, the KE method follows a fivestep procedure. First, presmoothing is conducted. Statistical models are fit to the raw data to attain estimated univariate and/or bivariate distributions of test scores. von Davier et al. (2004) use a set of parametric log-linear models to fit the data to discover the one that well-describes the data as simply as possible. Next is estimation of score probabilities. Using the design function (DF), marginal score probabilities for all possible score points denoted as ˆr and ŝ for test X and Y on the target population, T (i.e., the synthetic population), are obtained from the score distributions estimated in the previous step. The DF used in this step depends on the data collection design as well as the equating method being implemented. When the FE method is used under the CINEG design, the DF is given by r r( PQ,, w) DF( PQ, ; w) s s( PQ,, w), (2.19) where P refers to the joint distribution of Form X scores and the common-item scores for Population 1; Q refers to the joint distribution of Form Y scores and the common-item scores for Population 2; and w denotes the synthetic weight given to Population 1 ( 0 w 1). To be explicit, (1 w)( tql ) r w p l (2.20) l tpl

27 15 and where t Pl and t Ql refer to the column sums of P wt ( Pl ) s (1 w) q, (2.21) l l tql and Q, respectively; and p l and q l refer to the l th column of P and Q, respectively. Unlike the FE method, r and s are not directly calculated when CE is used. Instead, two levels are considered. The first level deals with the two DFs originating from the two single group designs within the CINEG design, namely, DF P and DF Q. Each is mathematically defined as: r P P DF P ( ) M P v( P) tp N P (2.22) and t Q Q DF Q ( ) N Q v( Q) sq M Q (2.23) where r P, s Q, t P, and t Q refer to the column vectors of score probabilities for Form X, Form Y, and common-item scores over populations 1 and 2, respectively. M and N are matrices of 0s and 1s used to transform vectorized score probabilities into the row sums and the column sums, respectively. In the second level, a single function arises combining the two DFs given in equations (2.22) and (2.23): rp M P 0 P DF P( ) P t P N v( P) DF( PQ, ) t Q DF Q( ) Q NQ v( Q). (2.24) 0 s Q M Q After obtaining the DF, a log-linear model is used to estimate the score distribution. Steps 1 and 2 are conducted simultaneously. The score distributions for Forms X and Y are discrete up until this point not allowing the equipercentile framework to be satisfied.

28 16 Then, continuization is conducted. Since both score distributions for Form X and Form Y are discrete, the continuous approximations F ˆ h X ( x ) and G ˆ h X ( y ) are found using the Gaussian kernel. The density for the kernel function for Form X is expressed as x ax x j (1 ax ) X 1 fh ( x) r X j j ax hx ax h, (2.25) X where () is the standard normal density function; h X is the bandwidth; j refers to the integer values ranging from 1 to the number of items in Form X; and score probability of score x j over the synthetic population. a X r j refers to the is defined as a 2 X 2 X 2 2 X hx. (2.26) The choice of the bandwidth h governs the smoothness or roughness of the estimated distribution. When the bandwidth is set to a fairly small value, F ( x ) becomes a close continuous approximation to F which is a discontinuous jump function. As a result, the estimated density function displays spikiness. Selecting the bandwidth is a subjective process bearing the risk of under- or over-smoothing. von Davier et al. (2004) proposed automated methods for selecting the optimal values using penalty functions that force the continuized density f ( x ) to be close to the original discrete distribution while suppressing the density from displaying excessive number of zero second derivatives (i.e., too many spikes or modes) at each score point (for detailed discussion on the choice of the bandwidth, see von Davier et al., 2004, pp ). The subsequent step is the equating phase, where the general equipercentile equating function in Equation (2.10) is adopted as h X e x Gˆ Fˆ x. (2.27) 1 Yh ( ) ( ( )) X hy hy hx The final step involves the computation of the standard error of equating (SE) and the standard error of the difference (SEED) between two kernel equating functions. Analogous to the SE derived for other equating methods, the uncertainty in the estimated equating relationship due to the sampling variability is measured by the SE in KE as well. h X

29 17 SEED can be used to compare the difference between two kernel equating relationships. Detailed mathematical procedures and equations for deriving the SEs and SEEDs under the FE and CE methods are provided in von Davier, et al. (2004). By examining each step of KE, it is obvious that the procedure is quite computationally intense. Nonetheless, KE often results in less random error than other equating methods because of the consistent use of presmoothing and smoothly transforming the data. Furthermore, the KE method is fully equipped with an elegant system of estimating equating functions and estimating their SEs available for all commonly used equating designs (single group design, random groups design, and CINEG design). Smoothing Method In this section, three smoothing methods including strong true score presmoothing, polynomial log-linear presmoothing, and cubic spline postsmoothing are discussed. All procedures have been commonly used in test equating settings. However, most research studies on smoothing have been focused only on the performance of these methods under the random groups design calling for more investigation on its performance in reducing error under various realistic conditions using the CINEG design. Polynomial Log-Linear Presmoothing In the polynomial log-linear presmoothing method, polynomial functions are fit to the log of the sample density (Kolen & Brennan, 2004). The fitted model can be expressed mathematically as: where C log[ ( )], (2.28) 2 C N X f x 0 1x 2x Cx denotes the degree of the polynomial. The parameters can be estimated by a maximum likelihood method. One of the fascinating properties of the fitted distribution is the moment preservation property. That is, the fitted distribution has identical first C moments to those of the sample distribution. Selecting C is a subjective choice based on

30 18 graphical fit and various 2 statistics (Haberman, 1974; Hanson, 1990). First, the overall goodness-of-fit comparing the fitted model to the empirical density is considered. Then, models not being excluded by the significance tests are further investigated. By examining the difference statistic, 2 2 C C 1 with a one degree of freedom, the simplest model with adequate fit is chosen. To amplify, C is selected that is one greater than the largest value of C that has a significant 2 (Kolen & Brennan, 2004, p. 80). Along with the hypothesis tests described above, the graphical fit and central moments should also be examined because multiple significance tests are related in the process of choosing the degree of the polynomial function. Bivariate Extension To use the log-linear presmoothing method under the CINEG design, an extension of the original idea is required. The fitted model is an extension of Equation (2.28) which is expressed as: C X C IX C IV i i i jk jk ki k xvii j k i 1 i 1 i ' 1, (2.29) log( p ) u ( v ) ( x ) ( v ) where p jk is the fitted probability for the total score j and the common-item score k ; is a normalizing constant constraining the p jk s to sum up to equal one; u jk is the null distribution which can be perceived as the center of any log-linear model; and C IX and C IV are both integer values that sum up to the number of cross-product moments to be preserved after fitting the specified model. s are the parameters estimated in the process of fitting a specified model equivalent to the s from the univariate log-linear model. As with the univariate procedure, the bivariate log-linear presmoothing also has a moment-preservation characteristic. By fitting the model given in Equation (2.29), the fitted bivariate (joint) distribution preserves C X moments in the marginal distribution of Form X scores, C V moments in the marginal distribution of common-item scores, and C XV cross-moments in the joint distribution of Form X total scores and common-item

31 19 scores. To denote the bivariate log-linear model fitted to the distributions, the notation takes the form of ( C X, CV, C XV ). Holland and Thayer (1998, 2000) indicated that models including powers quite high (e.g., five or six moments) were necessary to fit the univariate marginals of bivariate distributions adequately. Moreover, they suggested that models with cross-moments of the form of Equation (2.29) with majority of problems. ' ii, 2 adequately represent the joint distribution for a When the FE method is implemented, procedures described in Holland and Thayer (1998, 2000), von Davier et al. (2004), and Rosenbaum and Thayer (1987) can be implemented. As aforementioned, the essence of the extension is that the log-linear model is fitted to the joint distribution of common-item scores and total scores. The model-fitting calls upon the specification of the number of moments for each marginal distribution that is to be identical to the observed distribution moments. Additionally, the number of cross-product moments for the fitted joint distribution to be identical to the empirical distribution has to be specified. In contrast, if the CE method is used, the joint distribution of the total test scores and common-item scores is smoothed where the marginal distributions are exclusively used afterwards (Livingston, 1993). Kolen and Brennan (2004) suggested an alternative method for applying the log-linear smoothing to CE by only smoothing the marginal distributions of total test scores and common-item scores, thereby making the number of cross-product moments fitted irrelevant. Strong True Score Presmoothing In contrast to the polynomial log-linear presmoothing method, a parametric true score model is specified with the strong true score presmoothing method (Kolen & Brennan, 2004). The beta4 method, introduced by Lord (1965) is the most widely used and researched procedure. The population distribution of proportion-correct true scores,

32 20 ( ) is assumed to be a four-parameter beta distribution. The distribution of observed score given the true score, f( x ), also assumes a conditional parametric form, which is Lord s two-term approximation to a compound binomial distribution. Mathematically, the observed score distribution is then expressed as: 1 f ( x) f ( x ) ( ) d, (2.30) 0 and is referred to as the four-parameter beta compound binomial distribution or the beta4 distribution. The first four central moments of observed score distribution is preserved in the fitted distribution because the parameters are estimated by the method of moments (Lord, 1965; Hanson, 1991). Furthermore, the estimated true score distribution is a smooth fourparameter beta distribution, and the resulting observed score distribution is also a smoothed distribution (Kolen & Brennan, 2004). That is, a smoothed observed score distribution is obtained indirectly, as a result of smoothing the true score distribution ( ) (Cui, 2006). Similar to the polynomial log-linear presmoothing method, the fit of the model can be evaluated with the strong true score methods. As mentioned in the previous section, the evaluation can be based on graphical fit as well as the 2 goodness-of-fit statistics (Kolen & Brennan, 2004). The degrees of freedom are K 4 for the beta4 method, assuming that all score points are included in the computation. Keats and Lord (1962) portray a simplification of the beta4 method, called a betabinomial or a negative hypergeometric distribution. In this model, a two-parameter beta distribution is assumed for the true scores, and a binomial model is assumed for the conditional distribution for the observed scores given true scores. However, the application of the beta-binomial model in an equating setting is quite difficult, since the beta-binomial distribution is less flexible than the beta4 distribution (Kolen, 1991).

33 21 Bivariate Extension As with log-linear presmoothing, the beta4 method model has to be extended to be used under the CINEG design. The extension can be expressed as: where ( ) 1 p p[ v g( )] p( x ) ( ) d, (2.31) jk k j p ijk is the fitted probability for the total score is the true score distribution; 0 g( ) j and the common-item score k ; is the common-item proportion-correct true score as a function of the proportion-correct true score (Hanson, 1991). The application with the extended model is quite similar to that of the polynomial log-linear presmoothing method. Other than the fact the beta4 model assumes parametric true score models, the procedure operates identical to the log-linear presmoothing method in the process of equating. Cubic Spline Postsmoothing In the method described in this section, equipercentile equivalents, eˆ ( x ), are postsmoothed rather than presmoothed. Cubic smoothing splines were originally depicted by Reinsch (1967) and subsequently adapted by Kolen (1984). The equations for cubic splines given in this dissertation will follow Kolen and Brennan (2004). Y Given integer scores, x i, the cubic spline function for the interval x x x 1 is i i dˆ ( x ) v v ( x x ) v ( x x ) v ( x x ), (2.32) 2 3 Y 0i 1i i 2i i 3i i where the weights v 0i, v 1i, v 2i, and v 3i are changed from one integer score to the next. The cubic spline function should have continuous second derivatives at each score point, x i, over the range 0 x KX, where K X refers to the number of items in Form X. In the cubic spline postsmoothing developed by Reinsch (1967) and discussed in Kolen (1984), two different constraints are imposed. First is minimizing the curvature of the spline function, expressed mathematically as

34 22 xh 2 dˆ ''( x ) dx, (2.33) xl where ˆ '' d is the second derivative of ˆd (Kolen & Jarjoura, 1987). Second is to satisfy high 2 dˆ ( ) ˆ Y xi ey ( xi ) SE eˆ ( x ) S, (2.34) x x 1 i low Y i high low where x low is the lower integer score in the range; and x high is the upper integer score in the range. The terms are summed over the range of scores for which the spline is fit, to x high. ˆ ( ), where 0 xlow x xhigh K X equipercentile equating. SE e x is the estimated standard error of Y i x low S is a parameter that has to be equal to or larger than 0, and it has control over the smoothness of the equating relationship (Kolen, 1984; Kolen & Brennan, 2004). When S 0, the estimated unsmoothed equivalents and the fitted spline are equal at all integer scores. However, if S equals a very large number, the spline function becomes linear. By setting the constraint specified in Equation (2.34), the degree that the spline function values are allowed to deviate from the observed function values (Kolen, 1984, p.29) is restricted. The choice of S equatings using various values of S is made by the investigator based on a series of. Values between 0 and 1 have been observed to provide acceptable results in empirical studies (Cui, 2006; Cui & Kolen, 2009; Hanson, Zeng, & Colton, 1994; Kolen & Brennan, 2004; Kolen & Jarjoura, 1987; Moses & Liu, 2011). To estimate the cubic splines, Kolen (1984) reported using integer score points with percentile ranks between.5 and By excluding the extreme scores, we can avoid unjustified influence of scores with few examinees and poor estimates of standard errors. For extreme scores excluded from the estimated spline function, a linear interpolation

35 23 procedure was introduced in Kolen (1984, pp ) and discussed in more detail by Kolen and Brennan (2004, pp ). Traditionally, the symmetry property has been one of the requirements of an estimated equating relationship (Kolen & Brennan, 2004). That is, the function transforming a score of Form X to the Form Y scale should be the inverse of the function transforming a score on Form Y to the Form X scale. However, the cubic spline is a regression function, so the spline used for converting Form X to Form Y is different from the spline used for converting Form Y to Form X. Therefore, to achieve a symmetric function, a spline function dˆ ( y ) Then, the inverse of this function can be expressed as two spline functions d ˆ ( x ) and d function estimated: Y X, converting Form Y scores to Form X scale is defined. ˆ 1 X ( x) ˆ 1 X ( x) d. Finally, the average of the is defined as the final equipercentile equating dˆ x dˆ x, (2.35) 1 ˆ * ( ) ( ) ( ) Y X Y x 2 d where.5 x K X.5. Unlike the cubic spline presmoothing procedure, no statistical test exists for selecting the smoothing parameter S. Kolen and Brennan (2004) recommended using a value resulting in a smoothed function not departing too much (approximately, ±1 standard errors of equating) from the unsmoothed equivalents. Furthermore, they suggested comparing the central moments for the Form X scores equated to the Form Y scale using smoothing to those for the Form Y scores. To apply the cubic spline postsmoothing under the CINEG design, especially when the FE method is used, the procedure laid out in the previous paragraphs can be used directly. One caveat is that the standard errors of FE developed by Jarjoura and Kolen (1985) should be substituted for the ˆ ( ) SE e x in Equation (2.34). However, instead of using analytic standard errors suggested by Jarjoura and Kolen (1985) and Y i

36 24 Liou and Cheng (1995), a bootstrap procedure can be implemented to derive SE eˆ ( x ) as well. Cubic spline postsmoothing can also be implemented when the CE method is used. The estimates of e ( ) V 1 x and e ( ) Y 2 v group equating can be used for the ˆ ( ) Y are smoothed, and standard errors of single SE e x term. The smoothed functions can be plugged in to the population relationships in Equation (2.18). i Y i Review of Relevant Literature In this section, studies relevant to the questions asked in this dissertation are reviewed. This includes literature on factors affecting equating error under the CINEG design, reduction in equating error by smoothing, factors affecting smoothing in test equating, and comparisons of the performance of smoothing methods under the random groups and CINEG designs. Factors Affecting Equating Under the CINEG Design Under the CINEG design, assuming equivalent forms constructed following the content and statistical specifications closely, equating methods have a tendency to result in similar equating relationships if the groups administered the old and new forms are fairly similar (Kolen, 1990). Moreover, when the groups taking the old and new forms are equivalent or when the common-item scores are perfectly correlated with total test scores, the FE and CE methods provide equivalent equating functions (von Davier et al., 2004). Powers et al. (2011) found that increase in systematic equating error was associated with larger group differences for the FE method whereas no obvious trend existed between the extent of group difference and systematic error for the CE method. Various studies support these findings (e.g. Hagge & Kolen, 2011; Harris & Kolen, 1990; Kim & Lee, 2009; Kolen, 1990; Lawrence & Dorans, 1990; Livingston et al., 1990; Marco et al., 1983; Powers & Kolen, 2011; Schmitt, Cook, Dorans, & Eignor, 1990;

37 25 Sinharay & Holland, 2007; Skaggs & Lissitz, 1986; von Davier et al., 2004; Wang, Lee, Brennan, & Kolen, 2008). As mentioned in Chapter 1, because of the FE assumption of groups being similar, Kolen and Brennan (2004) asserted that the CE method might be preferred over the FE method when the groups are considerably different. Livingston et al. (1990) s study supports their suggestions. Variability in group proficiencies can also influence the amount of equating error. Wang et al. (2008), for example, found that having different group variances for the two groups taking the old and new forms resulted in more bias in estimating the equating relationship. Moreover, Kim and Lee (2009) showed that increasing the sample size considerably reduced the amount of random and total error. Additionally, the number of common items relative to the total test affects the accuracy of equating (Kolen & Brennan, 2004). Ricker and von Davier (2007) discovered a tendency of bias increasing as the length of the common-item set decreased, but the FE methods were less susceptible than the CE method. Holland, Sinharay, von Davier, and Han (2008) and Wang et al. (2008) also reported that as the relative length of the common item set increases, less bias was displayed for both the FE and CE methods. Having a larger number of common items also results in less random error (Budescu, 1985). Petersen et al. (1983) suggested that a problem might be found in the equating by having too few items in the common-item set. Results from Wang et al. (2008) indicated that the increase in total test length is associated with more equating error when the ratio of common items and sample size were kept constant for all test lengths. However, Kim and Lee (2009) discovered that the overall amount of equating error increased as test length increased. Wang et al. (2008) and Kim and Lee (2009) noted that longer test length requires a larger sample size and a longer common-item set to obtain the same level of equating accuracy.

38 26 Also, inadequate content and/or statistical representation of the common items could lead to critical problems especially when the groups taking the old and new forms differ in their level of proficiency (Cook & Petersen, 1987; Klein & Jarjoura, 1985; Kolen & Brennan, 2004). However, Sinharay and Holland (2006a, 2006b, 2007) asserted that while maintaining the content representativeness of the common items and the mean item difficulty to be identical to the equated test, the restriction on the range of the item difficulties might be released. Specifically when they used the term midi anchor, the spread of the item difficulties on the common-item set was less than that of the total test. Under the released condition, their results indicated that the performance of the equating methods were equivalent to the results using the mini anchor. Moreover, Sinharay and Holland (2007) suggested that when external anchors are used, implementing midi anchors should be most advantageous. Subsequent studies displayed evidence that midi anchors provide adequate equating results producing either smaller or very similar amount of equating error relative to using mini anchors (e.g., Sinharay, Haberman, Holland, & Lewis, 2012; Liu, Sinharay, Holland, Curley, & Feigenbaum, 2011; Liu, Sinharay, Holland, Feigenbaum, & Curley, 2009, 2011). To summarize, in this section, factors influencing the performance of equating procedures were examined. The differences in the groups administered the old and new forms affected the estimation of equating relationships. Equating error was enlarged as the group ability differences, in terms of their mean or variability, were increased. Especially, when the group differences were reasonably large, the use of CE instead of FE was recommended based on the assumptions for each procedure (e.g., Kolen & Brennan, 2004; Livingston et al., 1990; Powers et al., 2011). Equating error tended to diminish when sample size and the proportion of common items increased. Nonetheless, the total test length did not seem to have a direct effect on the accuracy of equating relationship estimation. Using contentwise- and statistically-representative common items was crucial in obtaining accurate equating results.

39 27 Reduction in Equating Error by Smoothing According to Kolen and Brennan (2004), both presmoothing and postsmoothing are supported by empirical evidence regarding their potential to reduce equating error. In this section, literature on the reduction in equating error by smoothing will be reviewed. The first part consists of studies conducted on smoothing univariate distributions under the random groups design; the second part focuses on the effectiveness of bivariate smoothing under the CINEG design. Smoothing under the Random Groups Design Cope and Kolen (1990) showed that methods involving smoothing such as the kernel method, four-parameter beta compound binomial, and Cureton-Tuckey methods improved equating accuracy compared to the unsmoothed frequencies. Little and Rubin (1994) also provided evidence that smoothed equipercentile equating relationships led to less error than unsmoothed equipercentile equating functions. Fairbank (1987) emphasized the caveat of using smoothing to increase equating accuracy in terms of average signed deviations increasing while the root mean square deviations decrease. He asserted that the use of negative hypergeometric presmoothing facilitates in maximizing the benefits outweighing its cost for many purposes. Hanson et al. (1994) also found that presmoothing and postsmoothing methods increased accuracy in equipercentile equating. Kolen (1991) reviewed several smoothing procedures including the kernel, strong truescore model-based, and polynomial log-linear postsmoothing methods and pointed out that they have potential to improve the estimation of test score distributions. Zeng (1995) stated that moderate amount of postsmoothing facilitates more accurate equating relationships; however, the optimal degree of smoothing in terms of a fixed smoothing parameter S under all situations was not discovered.

40 28 Extension for the CINEG Design Over time, many extensions of smoothing methods have been developed. Rosenbaum and Thayer (1987) developed an extension of log-linear presmoothing with the FE method, which was empirically shown to have potential to reduce equating error (Livingston & Feryok, 1987). Varying bivariate smoothing methods such as the polynomial log-linear and beta4 methods were compared by Hanson (1990), and considerable improvement in estimation was found. Also, Kolen and Jarjoura (1987) stated that the extension of the cubic spline postsmoothing used with FE increased precision of the equating. Livingston (1993) investigated chained equipercentile (CE) equating with loglinear presmoothing with small sample sizes under the CINEG design. Three separate smoothing procedures were performed: preserving the first two univariate moments and the first bivariate moment, i.e., (2, 2, 1), preserving the first three univariate moments and the first bivariate moment, i.e., (3, 3, 1), and preserving the first four univariate moments and the first bivariate moment, i.e., (4, 4, 1). Since the author was only interested in small-sample equatings, four small sample sizes including 25, 50, 100, and 200 were randomly drawn without replacement from the population distribution 50 times. Population distributions were defined by using data from a 100-item Advanced Placement Examination in United States history administered to 93,283 examinees. Two 58-item sub-forms were created where 24 items comprised a common-item set. To define the population equating conversion, a direct equating under a single group design using responses from all 93,283 examinees was conducted. Evaluation criteria included conditional root-mean-squared deviation (RMSD) and bias at each score point and RMSD for the full population of examinees. Livingston found that log-linear presmoothing facilitated reducing the sample size by at least half needed for a certain degree of equating accuracy for CE.

41 29 Taken together, studies under both the random groups and CINEG designs reveal that smoothing reduces equating error. Under the random groups design, kernel, fourparameter compound binomial, Cureton-Tukey, negative hypergeometric, polynomial log-linear, and cubic spline methods showed special promise; under the CINEG design, polynomial log-linear presmoothing, cubic spline postsmoothing, and kernel equating (presmoothing embedded within the procedure) produced less equating error. Factors Affecting Smoothing in Test Equating In this section, studies conducted on the factors that affect smoothing in test equating are reviewed: sample size, test length, systematic irregularities in population distribution, population distribution similarity, and smoothing parameters. The most obvious factor affecting the effectiveness of smoothing would be the sample size due to the close relationship with random error (Hanson, 1991). As sample size increases, random error reduces; therefore, it is expected that the equating with small samples would benefit more by smoothing than with large samples (Cui, 2006). As expected, the proportion of total error reduced by smoothing has decreased as the sample size increased in most studies (e.g., Cope & Kolen, 1990; Cui, 2006; Cui & Kolen, 2009; Hanson, 1990; Hanson, 1991; Hanson et al., 1994; Liu & Kolen, 2011; Zeng, 1995). Also, more smoothing was required to gain fairly low equating error as the sample size decreased (Colton, 1995). However, in equating circumstances with large sample sizes such as 5,000 or 10,000, the amount of bias introduced exceeded the amount of random error reduced resulting in larger total error compared to the unsmoothed equating relationship (e.g., Cui, 2006; Hanson et al., 1994; Liu & Kolen, 2011). Smoothing methods resulting in smallest total equating error differed depending on the sample sizes. Hence, more investigation is needed to provide guidelines for the best-performing smoothing method under differing circumstances (Kolen, 1991).

42 30 Different test lengths were also included as factors in some studies. Liu and Kolen (2011) used two different test lengths of 24 items and 48 items. In terms of the average mean squared errors of equating, smoothing was consistently more effective in reducing total error when the tests were longer. Furthermore, results reported by Kolen (1984) indicated that for shorter tests, smaller amounts of smoothing were needed. Nevertheless, results provided in Fairbank (1987) tell a different story. For example, the negative hypergeometric method produced decreasing root-mean-squared deviation (RMSD) values as the test length increased for simulated data; however, the cubic spline method resulted in more equating error as the simulated test lengthened. Colton (1995) suggested that the relationship between test length and sample size is reciprocal. In other words, he asserted that typically, the effect of decrease in sample size is identical to the effect of increasing the test length. The effects of systematic irregularities in population distributions on the performance of smoothing were examined by Moses and Liu (2011). The results suggested that test score distributions with systematic irregularities made the implementation of smoothing more complicated relative to the equating situations using relatively smooth population distributions. When the population distribution had systematic irregularities, different combinations of smoothing and equating procedures were identified to perform well depending on the evaluation criterion considered. When matching the mean and standard deviation of the Form Y scores with the smoothness of the equating relationship, linear equating, postsmoothing, and kernel procedures performed well. However, when matching the systematic irregularities and the skew of the test scores were implemented as evaluation criteria, equipercentile equating methods with or without smoothing displayed the best performance. Population distribution similarity with respect to the shapes is also known to have an effect on smoothing performance (Colton, 1995). Colton (1995) considered two differing conditions in his study: similarly negatively skewed distributions (similar) and

43 31 one negatively skewed and one nearly symmetric distributions (dissimilar). The results suggested that for situations with similar population distributions, larger amount of smoothing procedures tended to perform better than when the population distributions were dissimilar. Results from Cui (2006) also displayed similar patterns especially for the cubic spline postsmoothing method. Nonetheless, when an extremely small sample size of 25 was considered in Colton (1995), no equating (identity method) resulted in the lowest total equating error in most cases for the similar forms. Colton (1995) reported that equipercentile equating with polynomial log-linear presmoothing with C 2 was the most robust procedure to the dissimilarity in population distributions. Cui (2006) showed that the beta-4 smoothing was the most robust followed by cubic spline postsmoothing with S.20 when the population distributions skewed in opposite directions. The effect of having a bimodal population score distribution can also be reviewed by examining the results in Cui (2006). All and all, equating coupled with smoothing procedures produced the least total error when identical population distributions were used. Equating a unimodal distribution to a bimodal distribution produced slightly larger equating error followed by the situation where a form having a positively skewed population distribution was equated back to a form having a negatively skewed score distribution. Hence, the impact of having opposite skewness seems to be larger than having a bimodal distribution. Another factor that can be considered is the effect of different smoothing parameters. Smoothing methods are known to reduce random sampling error while introducing negligible systematic error in equating. It can be readily inferred that a moderate amount of smoothing should perform well whereas excessive smoothing might lead to larger total error relative to the unsmoothed equating functions by introducing too much bias. Kolen (1984) reported that sample size and the degree of similarities of old and new forms influenced the optimal degree of smoothing. Moreover, several researchers agree that no fixed smoothing parameter seems to provide the most

44 32 appropriate amount of smoothing under all situations (e.g. Cui & Kolen, 2009; Hanson et al., 1994; Zeng, 1995). When considering the studies in this section as a whole, the following trends were noted. First, a clear pattern of less effective smoothing was found as sample size increased. Second, a consistent effect for test length on error was not found. Third, a subjective selection of an equating/smoothing procedure was especially needed when systematic irregularities in the population distribution were present due to use of different criteria (e.g., data-matching, smoothness of the equating function) producing different best-performing combinations. Fourth, as the shape of the population distributions became more similar to each other, equating functions tended to be more accurate. Finally, the optimal amount of smoothing varied depending on two other factors: sample size and the degree of the form differences. Comparing the Performance of Smoothing Methods In this section, literature related to comparing relative performances of various smoothing methods will be reviewed. Studies under the random groups design will be discussed first followed by studies under the CINEG design. Studies in each part are presented in chronological order. Smoothing under the Random Groups Design Fairbank (1987) compared fourteen smoothing methods where seven were presmoothing methods, and seven were postsmoothing methods including the cubic spline method. All analyses were conducted under the random groups design with equipercentile equating. Two separate approaches using simulated data and operationally administered data were taken. For the simulation, three different test lengths of 15, 30, and 50 items were simulated, while two operational data were drawn from a 25-item Mathematics Knowledge test and a 20-item Electronics Information test administered to a very large sample (approximately 100,000 examinees). For both operational and

45 33 simulated data, 100 replicated samples of 2,000 examinees each were used (The samples used with the operational data were randomly drawn without replacement from the larger samples.). The negative hypergeometric method produced decreasing root mean square deviation (RMSD) values as the test length increased for simulated data; however, the cubic spline method resulted in more equating error as the simulated test lengthened. Fairbank (1987) recommended conducting a further study on the negative hypergeometric presmoothing and cubic-spline postsmoothing. Furthermore, results indicated that relative to using one of the two methods separately, using both presmoothing and postsmoothing to the same equating procedure did not provide markedly more accurate estimates of equating relationships. Hanson (1990) compared three methods for improving estimations of test score distributions used more frequently than the smoothing methods examined by Fairbank (1987): the kernel, polynomial, and four-parameter beta binomial methods. Population distributions were defined as three tests administered to a large number of examinees. Five hundred replicated samples were simulated for four different sample sizes of 500, 1,000, 2,000, and 5,000. All smoothness assumption-based methods considered in the study yielded estimates with smaller error compared to using the observed raw score distribution. Specifically, the effect of smoothing decreased as sample size increased. Among the smoothing methods, the four-parameter beta binomial method displayed the best performance across all conditions; however, the polynomial method provided equivalently accurate estimates when the sample size equaled 5,000. The kernel method was found to perform better than the polynomial method when test scores display a relatively flat distribution. Cope and Kolen (1990) compared five methods for estimating test score distributions: unsmoothed sample frequencies, kernel method, negative hypergeometric, four-parameter beta compound binomial, and Cureton-Tukey methods. Population distributions were defined by raw scores on a 40-item ACT English Usage test and a 75-

46 34 item Mathematics Usage test administered to 272,244 examinees. Five hundred replications were conducted for four different sample sizes including 500, 1,000, 2,000, and 5,000. Results showed that the four-parameter beta compound binomial method provided with the least error under all circumstances considered, closely followed by the kernel method producing slightly more erroneous estimates. Furthermore, as with the results in Hanson (1990), the proportion of total error reduced by smoothing decreased as the sample size increased in most cases. Cope and Kolen (1990) suggested that by using the four-parameter beta compound binomial or kernel methods, the required sample size for achieving an aimed estimation error might drop considerably. Hanson et al. (1994) compared three presmoothing methods including the beta binomial, four-parameter beta binomial, log-linear methods, and the cubic spline postsmoothing method. The random groups design was used for data collection and linear and equipercentile equating were selected as equating methods. Five pairs of population distributions were defined using real data collected from a large number of examinees (approximately 2,000 to 80,000 depending on the exam). Because of the relatively small sample sizes, score distributions on the tests administered to only 2,000 to 3,000 examinees were fitted to different models to eliminate the bumpiness so that they could serve better as the population distributions. For five different sample sizes (100, 250, 500, 1,000, and 3,000), 500 samples were drawn from each of the five pairs of population distributions. Ten different equating relationships were then estimated on the sampled distributions including one unsmoothed function, five functions based on presmoothing methods (polynomial log-linear model with C equaling 3, 4, and 6; beta binomial model; and four-parameter beta binomial model), three functions based on cubic spline postsmoothing method with S equaling 0.10, 0.25, and 0.5, and one linear function. Hanson et al. (1994) concluded that results generated by presmoothing and postsmoothing methods were comparable with one another. When sample sizes were small, presmoothing based on the beta binomial model produced the most accurate

47 35 results. However, when sample sizes were around 1,000 or more, presmoothing based on the four-parameter beta binomial model and the log-linear model resulted in more accurate equating functions. Among the two procedures, the log-linear presmoothing method might be preferred due to greater flexibility. However, the drawback of using the log-linear presmoothing method instead of the four-parameter beta binomial procedure is that it involves a selection process where the particular log-linear model to be used is decided by the investigator. By examining the mean squared error (MSE), it was evident that smoothing seemed to be more needed when sample sizes were relatively small, and no fixed smoothing parameter provided the most appropriate amount of smoothing under all conditions. Colton (1995) compared ten different equating and smoothing combinations: no equating, mean equating, linear equating, unsmoothed equipercentile equating, equipercentile equating with polynomial log-linear presmoothing ( C = 2, 3, 6), and equipercentile equating with cubic spline postsmoothing ( S = 0.25, 0.5, 0.75). Observed raw score distributions obtained from three 75-item ACT English Forms A (negatively skewed), C (negatively skewed), and, G (nearly symmetrical) administered to 3,153, 3,128, and 2,971 examinees respectively were smoothed with polynomial log-linear method with C equaling 10 defined as population distributions. The two groups administered the old and new forms were intended to be randomly equivalent. Four sample sizes including 25, 100, 400, and 3,200 were drawn 500 times from the old and new form pairs. Old and new forms were designated on the basis of their distributions being similar (Form A and Form C) or dissimilar (Form A and Form G). More smoothing was required to gain fairly low equating error as the sample size decreased. However, Colton (1995) suggested that no equating might be preferred with small sample sizes (400 or less) because it might lead to equating errors that are practically unacceptable. When an extremely small sample size of 25 was considered, no equating (identity method) resulted in the lowest total equating error in most cases for the similar forms.

48 36 Results also indicated that the relationship between test length and sample size is reciprocal. That is, typically, the effect of decrease in sample size was identical to the effect of increasing the test length. Population distribution similarity with respect to the shapes was also considered in his study: similarly negatively skewed distributions (similar) and one negatively skewed and one nearly symmetric distributions (dissimilar). For situations with similar population distributions, applying a larger amount of smoothing tended to perform better than when the population distributions were dissimilar. Colton concluded that equipercentile equating with polynomial log-linear presmoothing with distributions. C 2 was the most robust procedure to the dissimilarity in population In Cui (2006) and Cui and Kolen (2009), the relative performance of fifteen equating functions were compared. The methods included unsmoothed equipercentile equating, three established methods such as beta-4 presmoothing, polynomial log-linear presmoothing ( C = 3, 4, 5, 6), and cubic spline postsmoothing ( S = 0.10, 0.25, 0.5), and two additional proposed methods, cubic B-spline presmoothing ( K = 3, 5, 7) and direct presmoothing methods ( D = 3, 5, 7). The comparison was conducted under the random groups design and equipercentile equating framework. Sample pairs having different sizes of 300, 1,000, and 3,000 were drawn 10,000 times from three pairs of population raw score distributions defined by smoothing empirical datasets. Three subtests from the Iowa Tests of Basic Skills (ITBS) Form K and Form L (Hoover, Frisbie, & Dunbar, 1993) including Capitalization, Maps and Diagrams, and Reference Materials administered to 2,306 and 2,382 examinees respectively for the old and new forms were smoothed by fitting a 3-parameter logistic (3PL) model. The three subtests represented different population distributions. The Capitalization subtest was selected due to the nearly identical distributions of the old and new forms (the old form served as both the old and new forms in the analyses); the Maps and Diagrams subtest was selected due to the old and new form distributions being skewed toward opposite directions; and the

49 37 Reference Materials subtest was selected due to the old form distribution having two modes. Even though the purpose of Cui (2006) and Cui and Kolen (2009) s studies was to compare the performance of two new smoothing methods to conventional methods, comparison among the established methods might be more closely related to the purposes of this dissertation. By examining the error captured by weighted standard error (WSE) and weighted root mean square error (WRMSE), cubic spline postsmoothing was found to be best-performing, followed by the polynomial log-linear presmoothing for the Reference Materials and Capitalization subtests while the beta-4 method resulted in the smallest WSE and WRMSE, followed by cubic spline postsmoothing for the Maps and Diagrams test. The relative amount of reduction in equating error with respect to different sample sizes was congruent with previous studies where equating was more advantageous when small sample was used. With respect to the different distribution shapes considered, results displayed similar patterns to Colton (1995), especially for the cubic spline postsmoothing method where larger amount of smoothing showed better performance when the population distributions were similar to each other. Cui (2006) showed that the beta-4 smoothing was the most robust followed by cubic spline postsmoothing with S.20 when the population distribution for one form was negatively skewed and the other was positively skewed. The effect of having a bimodal population score distribution was also examined in Cui (2006). In general, the equating and smoothing procedures produced the least total error when identical population distributions were used. Equating a unimodal distribution to a bimodal distribution produced slightly more equating error followed by the situation where a form having a positively skewed population distribution was equated back to a form having a negatively skewed score distribution. Hence, it could be argued that the impact of the opposite skewness was larger than having a bimodal distribution under the studied conditions. The optimal degree of smoothing that performed best in all circumstances was not readily identified.

50 38 Liu and Kolen (2011) evaluated the relative performance of polynomial log-linear presmoothing and cubic spline postsmoothing using fixed smoothing parameters for equipercentile equating under the random groups design. Seventeen sets of equating functions were estimated using different smoothing procedures and/or smoothing parameters: unsmoothed; polynomial log-linear presmoothing with C equaling 2, 3, 4, 5, 6, 7, 8, and 9; cubic spline postsmoothing with S equaling 0.01, 0.05, 0.1, 0.2, 0.3, 0.5, 0.75, and 1.0. A large span of sample sizes including 100, 200, 500, 1,000, 2,000, 5,000, and 10,000 was considered. For each sample size, 500 samples were drawn from the population. Four pseudo-tests were constructed using the College Board Advanced Placement Program (AP ) Biology and Environmental Science test forms: two long pseudo-tests consisting of 48 multiple choice (MC) items and 2 free response (FR) items and two short pseudo-tests consisting of 24 MC items and 1 FR item. Hence, the constructed pseudo-tests included Biology long (BL), Environmental Science long (ESL), Biology short (BS), and Environmental Science short (ESS). All forms were administered to approximately 16,000 to 17,000 examinees. The population distributions were defined as the pseudo-test score distributions for the whole dataset. Results indicated that with larger sample sizes (5,000 or larger), smoothing did not contribute substantially to reducing total equating error quantified by average mean square error (AMSE), particularly when C 3 and S Furthermore, in equating circumstances with large sample sizes such as 5,000 or 10,000, the amount of bias introduced exceeded the amount of random error reduced resulting in larger total error compared to the unsmoothed equating relationship. Nevertheless, for smaller sample sizes (2,000 or smaller) and less degrees of smoothing applied ( C 4 and S 0.50 ), cubic spline postsmoothing was slightly better in reducing AMSE than log-linear presmoothing. For AMSE, smoothing was consistently more effective in reducing the total error when the tests were longer. These results are in line with results reported by Kolen (1984) indicating that for shorter tests, smaller amount of smoothing was needed.

51 39 By reviewing studies conducted under the random groups design, three smoothing methods were distinctly identified for superior performance in reducing equating errors: four-parameter beta binomial presmoothing, polynomial log-linear presmoothing, and cubic spline postsmoothing. However, the relative performances among the three recognized methods showed discrepancies from study to study. For example, some studies identified polynomial log-linear presmoothing as the best-performing method when sample size was large whereas other studies indicated that cubic spline postsmoothing performed better. The reason for the unambiguity found in the results could be because of other factors being confounded which were not included as a part of the investigation. No clear guidelines were identifiable from the literature review. Smoothing under the CINEG Design Despite the widespread use of smoothing methods in practice, fewer empirical investigations comparing their performances have been conducted under the CINEG design relative to the random groups design (Antal et al., 2011). Hanson (1991) compared eight different equating and smoothing combinations to investigate the effectiveness of bivariate smoothing methods in common-item equipercentile equating, especially the FE method. Combinations included unsmoothed equipercentile equating, four methods of smoothed equipercentile equating (two loglinear methods, one four-parameter beta binomial method, and one four-parameter beta compound binomial method), three linear equating methods (Tucker method, Levine Equally Reliable method, Levine Unequally Reliable method). Two log-linear smoothing model were compared. Log-Linear 1 model (LL1) was specified to preserve three moments of the marginal distribution of the common-item scores and total test scores, and one cross-product moment of the fitted joint distribution to be identical to the observed frequencies, i.e., (3, 3, 1). Log-Linear 2 model (LL2) was specified to preserve four moments of the marginal distribution of the common-item scores and total test

52 40 scores, and three cross-product moment of the fitted joint distribution to be identical to the observed frequencies, i.e., (4, 4, 3). For five sample sizes of 100, 250, 500, 1,000, and 3,000, 300 replicated samples were drawn from the pairs of population bivariate distributions. Population distributions came from three approximately parallel forms of a 100-item professional licensure exam. One of the forms (new form; administered to 38,765 examinees) was equated back to the other two forms (old forms; administered to 17,824 and 39,150 examinees, respectively), separately. For each equating, out of 100 items on a single form, 15 were internal common items and the rest were non-common items. The synthetic population weights were defined as w1 1 and w2 0. The fitted bivariate distributions based on the LL2 model were used as the population distributions. Results suggested that estimation of the equipercentile equating relationship can be improved by smoothing the bivariate distributions of common-items and total test scores for all sample sizes examined. Also, among smoothed equipercentile equating methods, LL1 method produced the most accurate equating functions under most of the conditions investigated, based on average MSE. As with smoothing under the random groups design, increase in sample size led to decrease in the proportion of total error reduced under the CINEG design as well. Furthermore, for almost all conditions considered in the study, equating error was smaller when groups administered the old and new forms are similar. Moses and Holland (2007) compared the performance of log-linear presmoothing and KE under the CINEG design. Four degrees of presmoothing were implemented with each equating methods (FE, CE, KE with FE, KE with CE): unsmoothed equipercentile equating, M221 (log-linear presmoothing preserving the first two univariate moments and the first bivariate moment), M661 (log-linear presmoothing preserving the first six univariate moments and the first bivariate moment), MP (actual population log-linear model fitted to the data). Sample sizes of 100, 200, and 1,000 were replicated 500 times. The data sources were smoothed distributions coming from a large-scale verbal

53 41 assessment administered to approximately 10,000 examinees on both forms. For all FE analyses, the synthetic population weights were specified as w1 w2.5. Evaluation criteria examined consisted of bias and empirical variability of the equating functions. For the conditions considered in this study, M221 model displayed the smallest variability and the largest bias, whereas the M661 model showed less bias, but more variable equating relationships. Nevertheless, the use of FE/CE versus KE did not exhibit largely different results when the distributions were strongly smoothed (i.e., M221 and M661 model). Moses and Holland (2007) asserted that the flexibility of KE trumps the increase in bias leaving the method as a well-suited procedure for evaluating the effect of presmoothing on the estimation of equating functions. Antal et al. (2011) compared log-linear presmoothing specified to preserve the first six univariate moments and the first bivariate moment, i.e., (6, 6, 1), and cubic spline postsmoothing with S 0.2 under the CINEG design. For equating methods, CE and FE were considered. Sample sizes of 300, 1,000, and 3,000 were simulated 100 times using the 2-parameter logistic (2PL) IRT model framework. Data came from an operational 60- item mathematics section of a college admission test, and the criterion equating relationship was generated by the IRT observed score equating. Factors controlled in the simulation procedure included: smoothing method (log-linear presmoothing, cubic spline postsmoothing), equating method (FE, CE), sample size (300, 1,000, 3,000), proportion of common-item set (10%, 20%, 40%), difference in mean difficulty of test forms measured in standard deviation unit of the ability (0.0, 0.5), and difference in the mean ability of the student populations measured in standard deviation unit of the ability (0.0, 0.2, 0.5). Results indicated that for a sample size of 300, cubic spline postsmoothing displayed the lowest mean squared error (MSE) when CE is used. However, for FE, when both test difficulties and population ability mean differences are fairly large (e.g., both test difficulty shift and population ability shift equaling 0.5), log-linear presmoothing performed better when sample size was 300. For sample size of 1,000, when test

54 42 difficulties were identical, cubic spline postsmoothing outperformed log-linear presmoothing; however, when the form difficulties were large, log-linear presmoothing produced more accurate estimates of equating relationships. When the sample size equaled 3,000, log-linear presmoothing performed better than cubic spline postsmoothing under almost all situations studied. Results indicated that CE performs better than FE when a specific smoothing method was under consideration. The authors reached a conclusion that for CE, cubic spline postsmoothing performs better, but the two smoothing methods performed similarly for FE. Hence, Antal et al. (2011) suggested that the combination of CE with cubic-spline postsmoothing is the most preferred. Because of the variety of equating methods that are readily available under the CINEG design (e.g., FE, CE, KE with FE, KE with CE), interpreting the results for studies examining the performance of smoothing as a whole becomes more complicated compared to studies conducted under the random groups design. The effect of smoothing should be viewed jointly with the effect of each equating methods on equating error. As with the literature on studies conducted under the random groups design, well-performing smoothing methods were identifiable: log-linear presmoothing and cubic spline postsmoothing. However, the comparison between the two methods was only conducted in one study (Antal et al., 2011) and results revealed that cubic spline postsmoothing performed better for CE, and the two smoothing methods displayed equivalent performances when FE was used. Summary of the Literature Review In this section, literature on factors affecting equating, reduction in equating error by smoothing, factors affecting smoothing, and comparing the performance of smoothing methods were reviewed. A summary of each part follows. As groups differed in their mean or variability of proficiencies, equating error tended to increase. Several researchers recommended using CE rather than FE when

55 43 group differences are fairly large (e.g., Kolen & Brennan, 2004; Livingston et al., 1990; Powers et al., 2011). Increasing the sample size and the proportion of common items relative to the total test length facilitated equating function estimation accuracy. However, increasing the total test length did not display a straightforward influence on equating accuracy. The effect seemed to interact with other factors such as sample size, number of common items, etc. Furthermore, using a common-item set not representative of the full-length test produced larger equating error. For both the random groups and CINEG designs, smoothing facilitated accurate equating relationships. The smoothing methods that displayed potential on reducing equating error under the random groups design included kernel, four-parameter compound binomial, Cureton-Tukey, negative hypergeometric, polynomial log-linear, and cubic spline methods. Furthermore, the bivariate extensions also showed capabilities to reduce equating error. Methods including the polynomial log-linear presmoothing, cubic spline postsmoothing, and kernel equating (presmoothing embedded within the procedure) displayed promise to facilitate accurate equating relationships. As sample size increased, the effect of smoothing was lessened. However, the difference in test length did not suggest a clear pattern in the effect of smoothing on equating error. Systematic irregularities in the population distribution (i.e., including score points that are impossible to obtain due to the scoring method used), calls for a subjective choice of an equating and smoothing procedure, since various criteria (e.g., data-matching, smoothness of the equating function) led to different best-performing equating/smoothing combinations (Moses & Liu, 2011). Also, when population distributions were similar with respect to their shapes, equating error tended to be smaller. The choice of the optimal value for the smoothing parameter was influenced by sample size and the degree of similarities of the old and new forms. Reviews of studies comparing smoothing methods under the random groups design revealed that in most cases, cubic spline postsmoothing, four-parameter beta

56 44 binomial presmoothing, and polynomial log-linear presmoothing performed better than other methods such as the beta binomial, Cureton-Tukey, and binomial kernel methods. Setting the relative performance of each smoothing process aside, all methods led to improved accuracy in estimating the population raw score distribution or the equating relationship. Moreover, smoothing methods resulting in smallest total equating error differed depending on the conditions studied. Hence, more investigation is needed to provide guidelines in finding the best-performing smoothing method under differing circumstances (Kolen, 1991). Under the CINEG design, the best-performing smoothing method varied depending on the equating method applied to the data. When FE was used, the log-linear presmoothing resulted in the least amount of error whereas the cubic spline postsmoothing performed better when CE was used (Antal et al., 2011). Most studies in which smoothing methods were compared were conducted under the random groups design (Antal et al., 2011). Specifically, only one study compared the performances of most frequently used smoothing methods (i.e., log-linear presmoothing and cubic spline postsmoothing) under the CINEG design (a summary of the studies can be found in Table A2.). Moreover, existing studies only considered some of the equating methods that are available under the CINEG design. By examining Table 2, it can be easily seen that none of the studies included MFE. The number of factors considered in each study was limited to two to four. Furthermore, the effect of factors such as commonitem types and spread of difficulty levels of common items on the performance of smoothing never have been investigated even though when they could be of interest to practitioners. Therefore, the current study aims to fill the existing gap in the literature on comparing the performances of smoothing methods under the CINEG design and to provide guidelines for selecting a particular smoothing method to be implemented under specific testing conditions by considering both the log-linear presmoothing and cubic spline postsmoothing methods, five equating methods (FE, CE, MFE, KE with FE, and

57 45 KE with CE), and seven factors (sample size, population mean, proportion of common items, form difficulty difference, common-item types, spread of difficulty levels of common items, and smoothing parameter).

58 46 CHAPTER THREE METHODOLOGY This chapter consists of three sections detailing the study design, simulation process, and evaluation criteria. In the study design section, factors considered in the simulation procedure, equating methods being studied, and computer software used are described. The simulation section delineates the process of constructing the forms used in the simulation, the definition of criterion equating relationship adopted in the current dissertation, and the simulation procedure itself. Lastly, in the evaluation criteria section, various indices used to compare the relative performance of the smoothing methods are discussed. Study Design The study design implemented is discussed in detail in the current section. Specifically, factors considered in the simulation procedure, equating relationships studied in this dissertation, and computer software used for the analyses are described. Factors Considered Analyses included three sample sizes, eight different population distribution pairs, two levels of proportion of common items, two degrees of form difficulty differences, two types of common items, and two levels of common-item difficulty spread. Since all factors studied were fully crossed with each other, a total of 192 conditions (3 sample size 4 population distribution pair 2 common-item proportion 2 form difficulty difference 1 test length 2 common-item type 2 common-item difficulty spread) were investigated. Factors studied in this dissertation are summarized in Table A1. Sample Size Three different sample sizes were considered: 300, 2,000, and 6,000 per form. Kolen and Brennan (2004) suggested a rule of thumb of 1500 examinees per form when

59 47 equipercentile equating is conducted under the CINEG design. A sample size of 300 represents a small sample size and 2,000 represents a sample size frequently used in reality. Moreover, a sample size of 6,000 represents a fairly large sample size where smoothing may be unnecessary, and bias may possibly be introduced by applying smoothing methods. In all cases considered, the same numbers of examinees were used for Form X and Form Y. Population Mean (Effect Size) Different degrees of group mean ability differences were specified to gauge the effects of different population distributions. Diverse levels of group differences were captured by differences in the mean of the two IRT distributions, where is a variable denoting the underlying proficiency of the examinees. Three different degrees of group proficiency differences were examined:.05,.2, and.5. Kolen and Brennan (2004) indicate that mean differences between the two groups of.3 or more standard deviation units often produce different equating results based on the equating method being used, and troublesome results are produced especially when the degree of differences becomes extreme (e.g.,.5 or more standard deviation units). Wang et al. (2008) labeled mean differences as relatively large when the values are between.05 and.1, and as very large when the value is.25 or higher. Regarding the rule of thumb provided, group mean differences considered in this dissertation are relatively small (.05), relatively large (.2), and nearly unacceptably large to obtain accurate equating results (.5). Population distributions were all assumed to be normal. Furthermore, for all conditions investigated, a normal distribution having a mean of 0 and a standard deviation of 1, N (0,1), was assigned to the group responding to the old form, Form Y, for simplicity. However, for the group administered the new form, Form X, four different mean abilities were assumed: N (.05,1), N (.2,1), N (.5,1), N(.2,1). Distributions having

60 48 a mean of.2 were included so that the effect of interaction between the direction of group differences and form difficulty differences could be examined. Proportion of Common Items Two different proportions of common items relative to the total test length were considered: 20% and 40%. As aforementioned, common-item sets were carefully constructed so that they were statistically representative of the total test. For the total test length of 60, the numbers of common items used in both forms were 12 (20%) and 24 (40%), respectively, regardless of the common-item types studied. The suggestion provided by Kolen and Brennan (2004) based on their experiences is to use at least 20% of the total test length to obtain adequate equating results unless the test length is very long in which case 30 common items might suffice. Form Difficulty Difference The differences in form difficulty were measured by the mean difficulty differences captured by the IRT b parameters calibrated under the three-parameter logistic (3PL) model. Under all conditions considered, the easier form having smaller b parameter value on average was designated to be the new form (Form X), and the more difficult form having larger b parameter value on average was assigned to be the old form (Form Y). Two different degrees of form difficulty differences were considered:.05 and.2. A difference of.05 was considered to represent two forms displaying relatively similar difficulties and.2 as dissimilar difficulties. Test Length A single total test length of 60 items was included in the simulation. According to Kolen and Brennan (2004), the rule of thumb for equating educational tests measuring multiple content areas is that the test length has to be at least items. Hence, by

61 49 having 60 items on each form, we could say that the number of items for each form satisfies the rule of thumb for obtaining an adequate equating result. Even though the total test length was 60 items in all analyses, test forms having 52 items (60 non-common items + 12 external common items) and 84 items (60 noncommon items + 24 external common items) were constructed to examine the effect of having external versus internal common-item sets. Types of Common Items Two variations of common-item sets were studied: internal and external. When the common items were treated as internal, scores on the common-item sets contributed to the examinees total test scores. However, when the common items are treated as external, scores on the common-item sets did not contribute to the total test scores. For instance, if internal common items were used with a 60-item test with 20% common items, a total of 60 items were used with 24 items serving as internal common items and 36 items as non-common items. However, when external common items were used under the same condition, a total of 84 items were considered where 24 common items did not have any influence on the examinees total test scores. Both internal and external common items are frequently used by testing programs. Therefore, including two different types of common items as a controlled factor in the simulation may yield practical insights for practitioners on the use of smoothing procedures in test equating. Spread of Difficulty Levels of Common Items Two common-item types in terms of difficulty spread were investigated: mini common-items and midi common-items. The mini common-item sets were constructed as suggested by Kolen and Brennen (2004) and Livingston (2004). They were statistically representative of the total test, namely a mini-version of the total test form. The statistical characteristics considered in the construction of the common-item sets included the spread of item difficulties measured by the IRT parameter b. Hence, for the mini

62 50 common-item sets, common items had the spread of item difficulties characterized in terms of the range (-2 to 2) and standard deviation similar to that of the total test. Midi common-item sets, in contrast, were constructed to have less spread (i.e., b values ranging from -1 to 1) and a standard deviation one-half the size as that of the total test (Sinharay & Holland, 2006a, 2006b, 2007). In both common-item types, the mean item difficulty was matched as closely as possible to equal the mean item difficulty of the test forms to be equated (Livingston, 2004). Equating Methods Studied In the current section, 29 equating/smoothing combinations investigated in this dissertation are discussed in detail along with processes used for equating. Test Equating Methods Three different equating methods including FE, CE, and KE were paired with two smoothing methods: polynomial log-linear presmoothing and cubic spline postsmoothing. Also, for each smoothing method, varying degrees of smoothing were considered. For the log-linear presmoothing, three different smoothing parameters (two for the marginal distributions and one for the cross product moment) were specified. Four different sets of C parameters were considered for the log-linear presmoothing: (4, 4, 1), (4, 4, 3), (6, 6, 1), and (6, 6, 3). For the cubic spline postsmoothing, three different degrees of smoothing parameter S were specified:.1,.3, and.5. In all, a total of 29 combinations of equating relationships were examined under each condition described earlier: four log-linear presmoothing with FE, four log-linear presmoothing with CE, four log-linear presmoothing with MFE, three cubic spline postsmoothing with FE, three cubic spline postsmoothing with CE, three cubic spline postsmoothing with MFE, four KE with FE, and four KE with CE. All equating relationships studied are listed below: 1. Log-linear presmoothing with FE: C (4, 4, 1), (4, 4, 3), (6, 6, 1), (6, 6, 3)

63 51 2. Log-linear presmoothing with CE: C (4, 4, 1), (4, 4, 3), (6, 6, 1), (6, 6, 3) 3. Log-linear presmoothing with MFE: C (4, 4, 1), (4, 4, 3), (6, 6, 1), (6, 6, 3) 4. Cubic spline postsmoothing with FE: S.1,.3,.5 5. Cubic spline postsmoothing with CE: S.1,.3,.5 6. Cubic spline postsmoothing with MFE: S.1,.3,.5 7. KE with FE: C (4, 4, 1), (4, 4, 3), (6, 6, 1), (6, 6, 3) 8. KE with CE: C (4, 4, 1), (4, 4, 3), (6, 6, 1), (6, 6, 3). Process For the equating methods with log-linear presmoothing, the bivariate distributions were first smoothed. Then, equipercentile equating (FE, CE, or MFE) was conducted on the smoothed score distributions. Synthetic population weights of w1 w2.5were assumed for the FE and MFE methods. However, when cubic spline postsmoothing was implemented with FE or MFE, equating was conducted first using the same set of synthetic population weights of.5. Next, smoothing was applied directly to the estimated equating relationships. When cubic spline postsmoothing was used with CE, two separate equipercentile equating relationships were estimated and smoothed, then plugged into Equation (2.18). For KE procedures with both FE and CE, a separate smoothing was not conducted because a presmoothing step is involved in the equating method itself. For all methods considered, the steps were repeated 100 times on the simulated datasets. Computer Software R (R Development Core Team, 2011) was used to simulate datasets under each condition. To obtain the estimated equating relationships for FE, CE, and MFE with loglinear presmoothing and cubic spline postsmoothing, and KE, the open-source C functions in Equating Recipes (Brennan, Wang, Kim, & Seol, 2009) were used.

64 52 Simulation In the current section, the simulation procedure is presented. First, the form construction procedure is discussed. Then, the definition of the criterion equating relationship used in this dissertation is described. Lastly, the simulation procedure is discussed in detail. Form Construction To construct test forms with realistic item parameters, Forms X and Y were constructed using an item parameter pool containing a, b, and c parameters calibrated under a 3PL model for a few hundred items using a dataset collected from a large number of examinees. Using the item parameters from the pool, forms were constructed by handpicking the items in an Excel spread sheet to satisfy each simulation condition (e.g., proportion of common items, test length, form difficulty differences, internal common items versus external common items, mini common items versus midi common items) as defined in the previous section as closely as possible. First, 16 old forms (Form Ys) were constructed to have an average difficulty of 0 (i.e., by 0 ). For the forms having internal common items, 60 items were hand-picked; whereas 60+ items were hand-picked with external common items conditions (72 items when 20% common items were used, and 84 items when 40% common items were used). Out of the selected items, either 20% or 40% of the items were sorted out to perform as common items. Common-item sets were hand-picked to satisfy the average difficulty to equal 0. Also, to satisfy the condition for mini versus midi common-item sets, the standard deviation of the difficulty parameter was closely controlled. For the mini common items, the standard deviation was approximately equal to those of the total forms. On the other hand, when midi common items were assumed, the standard deviation was approximately half of those of the total forms.

65 53 Then, 16 new forms (Form Xs) were constructed. The pre-specified common items automatically became a part of the new forms. Hence, only the unique items were hand-picked for the new forms. The average difficulty parameters had to be as close as possible to either.05 or.2 depending on the form difference condition. With the forms, the average of the discrimination, difficulty, and pseudo-chance parameters, and the standard deviation of the difficulty parameters were matched to be as similar as possible for the old and new forms. This whole process yielded a total of 32 forms using the item parameters, including 16 old forms and 16 new forms. The descriptive statistics for the item parameters for each form are to be found in Tables A3 and A4. None of the constructed forms had identical items included. It seems that a direct comparison of the summary statistics derived from totally different test forms might not be tenable because then a different criterion would be used to derive the summary statistics. However, the idea was sensible in two aspects. First, it most closely reflects the real world situation where unique items are in fact unique in each form. If a comparison between the conditions by using a single criterion is to be possible for all factors studied in this dissertation, necessarily, some of the items that are common in the old and new forms will have to be treated as unique items for the sake of equating. However, in the operational testing setting, only the common items will actually be common in the old and new forms. Next, the computation and direct comparison of aggregated summary statistics made more sense with non-identical forms. By averaging summary statistics for all simulated conditions except for the factor of interest, we obtain the aggregated summary statistics. Since all test forms were different, we could conceive of the different conditions averaged as random replications under the specific main effect. Aggregated summary statistics were used to discuss the performance of smoothing methods for different main factors studied in this dissertation. A description of aggregated summary statistics and their comparisons will be provided in more detail in Chapter 4.

66 54 Criterion Equating Relationship Equivalent scores computed using the IRT observed score equating based on item parameters and population distributions were considered as the criterion equating relationship. Once again, the item parameters constructing the test forms under differing conditions were used to obtain the criterion equating relationships. Steps In the IRT observed score equating, an IRT model is used to estimate the observed number-correct score distributions, and then the conventional equipercentile equating is conducted on the estimated distributions weighted by the specified synthetic weights for the two populations (Kolen & Brennan, 2004). Lord and Wingersky (1984) used a compound binomial distribution for Form X to derive the conditional distribution of the observed number-correct scores given ability. A recursion formula was developed that could be used to obtain the observed score distribution. Mathematically, the recursion formula for r 1, can be expressed as: f ( x ) f ( x )(1 p ), x 0, r i r 1 i ir f ( x )(1 p ) f ( x 1 ) p, 0 x r, r 1 i ir r 1 i ir f ( x 1 ) p, x r, r 1 i ir where f ( x ) is defined as the conditional distribution of number-correct scores over r i the first r items for examinees having ability i ; f1( x 0 i) 1 pi1 (3.1) is defined as the probability of incorrectly responding to the first item; and f1( x 1 i ) p i is defined as 1 the probability of correctly responding to the first item. Next, the conditional observed score distributions are accumulated across all ability levels to attain the observed score distribution for examinees of various ability levels. Assuming a continuous ability distribution, the marginal observed score distribution is calculated by: f ( x) f ( x ) ( ) d, (3.2)

67 55 where ( ) refers to the distribution of ability,. Generally, when BILOG (Zimowski, Muraki, Mislevy, & Bock, 2006) is implemented, ( ) is specified as a discrete distribution on a finite number of equally spaced points (i.e., quadrature points) to approximate the integral. Then, the integration can be substituted by a summation lessening the computational burden: f ( x) f ( x ) ( ). (3.3) i Applying the identical steps, the observed score distribution for Form Y can be found. Subsequently, four different distributions can be designated using Equation (3.3): f ( x) f ( x ) ( ) is the Form X distribution for Population i 1 i 2. 2 i 2 i i f ( x) f ( x ) ( ) is the Form X distribution for Population i 1 i i g ( y) g( y ) ( ) is the Form Y distribution for Population i 2 i i g ( y) g( y ) ( ) is the Form Y distribution for Population 2. i Finally, the conventional equipercentile equating is implemented on these quantities with pre-specified population synthetic weights. As for all of the equating methods used in the simulation process, synthetic weights of w1 w2.5 were used for the IRT observed score equating. Further discussion on IRT observed score equating can be found in Kolen and Brennan (2004). i i Rationale Several justifications can be made for the use of IRT observed score equating as the criterion equating relationship in this dissertation. First, since the equating methods being scrutinized are all non-irt methods (FE, CE, and KE), none of the equating methods are gaining unnecessary advantages by using IRT observed-score equating as a criterion. Also, transformation of IRT scales, which can be a source of error, is unnecessary in the given situation. The item parameters used to obtain the criterion

68 56 equating relationship are located on the same IRT scale. Furthermore, IRT observedscore equating can be viewed as the equipercentile equating for the population as long as IRT assumptions hold, making the method a sensible true equating procedure (Kolen & Brennan, 2004; Wang et al., 2008). Also, equating error arising by the use of commonitem sets in the CINEG design will be reduced when IRT observed score equating is implemented because the common items do not have a direct role in deriving the equivalent scores as they do with FE or CE. More specifically, since the true item parameters are already on the same scale, a scale transformation process through a common-item set is not necessary and a direct equipercentile equating is conducted using the full-length forms to obtain true equating relationships. Procedure In this section, the steps taken in the simulation procedure will be described. First, the dichotomous responses for each form were generated using R (R Development Core Team, 2011). An IRT 3PL model was assumed for the data generation. From the two population distributions of interest, 100 sets of N examinees having an ability parameter, were randomly sampled. As mentioned above, the sample extracted from the standard normal distribution, N (0, 1), was always treated as the group administered the old form, Form Y. For each examinee, the probability of person i responding to item j was calculated as where i correctly exp[ Da j ( i b j )] pij pij ( i; a j, bj, c j ) c j (1 c j ), (3.4) 1 exp[ Da ( b )] is proficiency parameter for person i ; and a j, b j, and j i j c j are item parameters for item j, referring to the discrimination, difficulty, and pseudo-chance parameters, respectively, and D is typically set to 1.7, serving as a scaling constant (Lord, 1980). When computing the probability of the correct answer, the item parameters

69 57 substituted in Equation (3.4) were those retrieved from the constructed test forms discussed in the previous section. Then, the calculated probability from a uniform distribution ranging from 0 to 1, p ij was compared to a random number drawn U (0,1). The method implemented for the response simulation in this dissertation is based on the probability integral transformation theorem. According to the probability integral transformation theorem, a cumulative distribution function of a continuous random variable has a uniform distribution (Casella & Berger, 2001). Therefore, if number, a 1 was assigned (i.e., examinee 0 was assigned (i.e., examinee i i p ij is larger than the randomly drawn correctly responded to item incorrectly responded to item given above, 100 replicated datasets containing 0/1 responses on K examinees were created. j j ), and otherwise ). Following the process items from N Using the simulated datasets for Form X and Form Y, 29 equating relationships described in the previous section were estimated. Various functions written in C language in Equating Recipes (Brennan et al., 2009) were implemented to obtain the estimated equating functions. Each equating result produced from each replicated dataset was saved. Previous steps were replicated 100 times under each condition. As a result, 100 estimated equating relationships were acquired. The 100 obtained estimated equating functions were compared to the criterion equating relationship using the evaluation criteria described in the next section. Evaluation Criteria Bias, standard error of equating (SE), and root mean square error of equating (RMSE) were used as evaluation criteria. Conditional indices for each score point were computed as well as the weighted indices considering all score points overall. Each index captured different types of equating error.

70 58 Bias, an index quantifying the systematic error of an estimator eˆ ( x ) Y i at a particular score point x i, was defined as the difference between the average equivalent score across the 100 replications and the true equivalent score. The equation for calculating bias is given as where eˆ ( x ) Y i Bias eˆ ( x ) eˆ ( x ) e ( x ), (3.5) Y i Y i Y i represents the mean across 100 replications, and e ( x ) Y i denotes the true equivalent score (i.e., the equivalent score resulting from the IRT observed score equating; a detailed discussion was presented previously). Weighted absolute bias (WAB) considering all score points simultaneously, was calculated by aggregating the weighted bias at each score point. Because positive or negative values for bias can cancel each out when summed, the absolute values were used instead of the original bias values in the summation (Cui, 2006; Hou, 2007). (For all of the weighted indices, the weighted conditional error indices were averaged across score points, where the weights came from the probability distribution of Form X.) Mathematically, WAB was computed by K WAB eˆ ( x) P Bias eˆ ( x ), (3.6) Y xi Y i xi 0 where P xi refer to the proportion of examinees at score point x i, and K indicates the number of items on the total test form. The standard error (SE) capturing the random error at each score point x i was defined as the square root of the average of the sum of squared differences between the estimated equivalent score and the average equivalent score across the 100 replications: SE eˆ ( x ) eˆ ( x ) eˆ ( x ), (3.7) Y i Y i Y i 100 r 1 2 where r denotes each replication.

71 59 Weighted standard error (WSE), considering all score points simultaneously, was calculated by computing the square root of the sum of weighted conditional variances: ˆ ( ) ˆ ( ) 2 Y x Y i. (3.8) i WSE e x P SE e x x 0 i K Root mean square error (RMSE), considering systematic and random error together at each score point x i was also computed. RMSE captures the total error in the equating procedure by combining systematic and random error. It was defined as the square root of the sum of the squared bias and the squared standard error: ˆ ˆ ˆ Y ( i ) Y ( i ) Y ( i ) 2 2 RMSE e x Bias e x SE e x. (3.9) Weighted root mean square error (WRMSE) where all score points are considered together, was computed as: ˆ ( ) ˆ ( ) 2 Y x Y i. (3.10) i WRMSE e x P RMSE e x x 0 i The evaluation criteria expressed above were computed under each condition studied in the simulation procedure. Then, the best-performing smoothing method was identified under different conditions by comparing the indices calculated attempting to provide a guideline for practitioners. K

72 60 CHAPTER FOUR RESULTS In the current chapter, effects on equating error are examined for six different main factors (sample size, group difference, proportion of common items, form difference, common item type, and spread of difficulty parameters in the common-item sets), and different smoothing parameters and equating methods. Results are reported aggregated across the range of test score points. Tables A5 through A22 display the aggregated summary statistics for each main effect; Tables A23 and A24 provide the aggregated summary statistics for using different smoothing parameters and equating methods, respectively. These aggregated statistics were computed by averaging each statistic across all other conditions. For instance, for values in Table A5, weighted absolute bias computed for the 64 conditions using sample sizes of 300, 2,000, and 6,000 were averaged. For all comparisons, only results with smoothing parameters producing the smallest error are emphasized. Comparisons among aggregated conditions are appropriate due to the way test forms were set up in the simulation process. For example, if the aggregated summary statistics for different sample sizes were obtained, we could think of the various testing conditions under the sample size of 300 as 64 random replications encompassing the summary statistics for all the conditions where a sample size of 300 was used. Figures B1 through B8 show the best-performing smoothing methods for all 192 conditions considered in this study in terms of weighted absolute bias, weighted standard error, or weighted root mean square error. Since three different equating were applied to the 192 conditions, the total number of the cells in the figures sum to Each smoothing method is assigned a different color so that the overall pattern of the bestperforming methods can be readily examined with white, red, yellow, and green used for unsmoothed, cubic spline postsmoothing, log-linear presmoothing, and kernel equating,

73 61 respectively. A separate figure is given for each of the main effects examined based on total equating error (which is a composite of systematic and random error) for all main factors. Additional figures for systematic and random are only presented with the group difference conditions to illustrate the patterns of the best-performing methods in terms of weighted absolute bias and weighted standard error. Sample Size Tables A5 through A7 display the aggregated summary statistics for sample sizes of 300, 2,000, and 6,000. The tables show that weighted absolute bias decreased slightly when sample size increased from 300 to 2,000 but remained consistent when sample size increased from 2,000 to 6,000. For weighted standard error, values rapidly decreased as the sample size increased. The amount of decrease was much larger for the weighted standard error than for weighted absolute bias and weighted root mean square error, indicating that increasing sample size overall has less impact on systematic error than on random error in equating. Figure B1 highlights smoothing methods that performed best in reducing total equating error. The first three columns display the results for the three equating methods (frequency estimation, modified frequency estimation, and chained equipercentile equating) for a sample size of 300, the next three for a sample size of 2,000, and the last three for a sample size of 6,000. Cubic spline postsmoothing tended to produce the smallest total equating error with a sample size of 300 (54% of the conditions). The proportion of occasions where log-linear presmoothing or kernel equating performed the best increased from 46% to 66% with an increase in sample size from 300 to 6,000. Within the chained equipercentile method, kernel equating produced increasingly smaller total equating error than did presmoothing and postsmoothing as sample size increased from 300 to 6,000.

74 62 Frequency Estimation Equipercentile Equating In all cases for frequency estimation equating (see Table A5), weighted absolute bias was lower with smoothing compared to no smoothing. Cubic spline postsmoothing always produced higher weighted absolute bias, and log-linear presmoothing always produced the lowest. Systematic error also increased when using frequency estimation in conjunction with kernel equating method at N 6,000. For weighted standard error, values were always lower with smoothing than without smoothing irrespective of the smoothing method used with reductions in random error ranging from 35% to 45%. Smoothing effects were inversely related to sample size with stronger effects at smaller than at larger sample sizes. Among the smoothing methods, cubic spline postsmoothing consistently displayed the smallest random error. As with random error, the amount of total equating error decreased with smoothing regardless of the sample size used with reductions ranging between 1% and 14%. As the sample size increased, smoothing as a means of reducing the total error diminished. When frequency estimation was used, log-linear presmoothing consistently yielded smaller weighted root mean square error than did cubic spline postsmoothing. Modified Frequency Estimation Equipercentile Equating For modified frequency estimation, weighted absolute bias decreased with smoothing in most cases. The only exceptions were using postsmoothing with N 300 and presmoothing with N 6,000. Log-linear presmoothing produced the smallest weighted absolute bias with N 300, whereas cubic spline postsmoothing produced the smallest with sample sizes N 6,000. In all cases, smoothing reduced weighted standard errors with reductions ranging from 33% to 44%. With larger sample sizes, the effect of smoothing on the reduction of random error was generally smaller. Cubic spline postsmoothing produced the smallest random error regardless of the sample size considered.

75 63 The amount of total equating error was always smaller with any type of smoothing with reductions ranging from 2% to 18%. As sample size increased, the proportion of reduction in total equating error decreased. With N 300, log-linear presmoothing performed better than cubic spline postsmoothing, but with sample sizes of 2,000 and 6,000, the reverse was true. Chained Equipercentile Equating For chained equipercentile equating, weighted absolute bias increased in almost every instance when either cubic spline postsmoothing or log-linear presmoothing was used, regardless of sample size. Combining kernel with chained equipercentile equating consistently showed smaller systematic error than did chained equipercentile paired either with presmoothing or postsmoothing. In all cases, weighted standard errors were reduced by 37% to 47% when smoothing was applied. Differences in random error produced by unsmoothed versus smoothed methods tended to become smaller with increases in sample size, with cubic spline postsmoothing producing the smallest weighted standard error regardless of sample size used. Cubic spline postsmoothing displayed the smallest weighted root mean square error with the sample size of 300, whereas kernel combined with chained equipercentile equating produced the smallest with sample sizes of 2,000 and 6,000. Total equating error was reduced with smoothing in all situations with reductions ranging from 4% to 25%, but the effect was greater at smaller sample sizes. Summary Using larger sample sizes reduced weighted absolute bias, weighted standard error, and weighted root mean square error, but these effects were more prominent for random error than for systematic error. Increasing sample size did not have a noteworthy effect on the relative performance of smoothing methods when frequency estimation and

76 64 chained equipercentile equating were used. Regardless of the sample size used, weighted absolute bias and weighted root mean square error were smaller with presmoothing than with postsmoothing, but the opposite was true for weighted standard error. When modified frequency estimation was used with sample sizes larger than 2,000, postsmoothing produced smaller systematic, random, and total equating error compared to presmoothing. With the chained equipercentile method, kernel equating always resulted in the smallest weighted absolute bias, and postsmoothing resulted in the smallest amount of random error. For total error with the chained equipercentile method, postsmoothing did best at N 300, whereas kernel equating did best at samples of 2,000 and 6,000. Group Mean Difference Aggregated summary statistics for group mean ability differences are given in Tables A8 through A10. When the old and new groups differed more in average proficiency, weighted absolute bias increased, and this was especially true for frequency estimation in comparison with the other two equating methods. In contrast, chained equipercentile equating showed lesser increases in systematic error and was therefore most robust to group difference effects than were the other methods. Weighted standard errors also increased when the group differences were larger, but the amount of the increase was much smaller than for weighted absolute bias. In most cases, the group administered the old form (Form Y) was fixed at N (0,1), and the easier form (the average of the difficulty parameter always equaling 0) was given to the old group. When the reverse was true for a group difference of -.2, the values of weighted absolute bias, weighted standard error, and weighted root mean square error did not change much with differences between the aggregated indices equaling.04 or less. Figures B2 through B4 show the general pattern of the best-performing equating/smoothing method under each condition. Within these figures, the first upper

77 65 quarter shows patterns for a group difference of.05, followed by the group differences of.2,.5, and -.2, respectively. In terms of weighted absolute bias (see Figure 4.2), as the group differences increased, the proportion of conditions where the unsmoothed methods outperformed the smoothed methods decreased from 28% to 19%. The proportion dropped to 11% when the group difference equaled -.2. Log-linear presmoothing produced the smallest weighted absolute bias under a majority of conditions when group differences equaled.05,.2, and.5 (43%, 41%, and 38% of the cases, respectively); whereas cubic spline postsmoothing provided the smallest when the group difference was -.2 (58%). For the chained equipercentile method, kernel equating typically produced smaller systematic error than did the smoothing methods as the group difference increased (13% when groups differed by.05, and 27% when groups differed by.5). Figure 4.3 shows the pattern for methods producing the smallest weighted standard errors when different group differences were considered. Cubic spline postsmoothing produced the smallest random error in most situations (82% of all conditions). However, as groups differences increased, the number of conditions where log-linear presmoothing and kernel equating methods outperformed postsmoothing increased (e.g., 6% when groups differed by.05, and 42% when groups differed by.5). In no instances did the unsmoothed equating methods outperform the smoothed methods. The methods performing best in terms of total equating error are given in Figure 4.4. For group differences of.05,.2, and.5, log-linear presmoothing resulted in the smallest weighted root mean square errors under most conditions (45%, 43%, and 47% of the conditions, respectively); for a group difference of -.2, cubic spline postsmoothing performed best (74% of the conditions). For the chained equipercentile method, kernel equating performed better than smoothing in a majority of situations (52%) across group differences.

78 66 Frequency Estimation Equipercentile Equating Table A8 shows that weighted absolute bias increased when postsmoothing was used with frequency estimation, except when the group difference was -.2. When loglinear presmoothing and kernel frequency estimation equating were applied, weighted absolute bias always decreased. Log-linear presmoothing produced smaller weighted absolute bias than did cubic postsmoothing when groups differed by.05,.2, and.5, whereas cubic spline postsmoothing produced the smallest when group difference equaled -.2. Regardless of the degree of group differences, weighted standard errors decreased with smoothing with the proportion of reduction ranging from 35% to 44% across conditions. Among the smoothing methods, cubic spline postsmoothing consistently produced the best results. Total equating error, as indexed by weighted root mean square error, always decreased when smoothing was applied with reductions ranging between 3% and 13%. The proportion of the reduction tended to get smaller as the group differences increased. The pattern of best-performing smoothing methods for weighted root mean square error was identical to that of weighted absolute bias with log-linear presmoothing outperforming postsmoothing with group differences of.05,.2, and.5, and the opposite being true when the group difference was -.2. Modified Frequency Estimation Equipercentile Equating With modified frequency estimation, weighted absolute bias increased only when cubic spline postsmoothing was used with a group difference equaling -.2. In all other instances, systematic error decreased with smoothing compared to no smoothing. Cubic spline postsmoothing produced the smallest weighted absolute bias when group differences equaled.05 and.2, whereas, log-linear presmoothing performed best with group differences of.5 and -.2.

79 67 Weighted standard errors always decreased with smoothing when modified frequency estimation was used with reductions ranging from 32% to 43%. The proportion of the reduction did not show a consistent pattern in relation to group ability differences. Cubic spline postsmoothing always produced the smallest weighted standard errors regardless of the degree of group difference considered. Total equating error was decreased by all types of smoothing in all instances with reductions ranging from 8% to 14%. As the group differences increased, the effect of smoothing diminished. With group differences of.05,.2, and -.2, cubic spline poststmoothing outperformed log-linear presmoothing, but the reverse was true with a large group difference of.5. Chained Equipercentile Equating Weighted absolute bias increased with both cubic spline postsmoothing and loglinear presmoothing when chained equipercentile was used except for cubic spline postsmoothing was used with a group difference of -.2. Kernel combined with chained equipercentile equating produced lesser systematic error when groups differed by.05,.2, and.5 than did either postsmoothing or presmoothing. However, with a group difference of -.2, cubic spline postsmoothing produced the smallest weighted absolute bias. Smoothing reduced the amount of random error by 37% to 47% in every situation investigated. As with frequency estimation and modified frequency estimation, no consistent relationship emerged between reductions in weighted standard errors and extent of group differences with cubic spline postsmoothing producing the smallest weighted standard error for all group differences. Weighted root mean square error always decreased with smoothing with reductions ranging from 12% to 18%. The pattern of best-performing methods was identical to that of weighted absolute bias with kernel combined with chained equipercentile producing the smallest weighted root mean square error when group

80 68 differences were.05,.2, and.5, and cubic spline postsmoothing producing the smallest with a group difference of -.2. Summary As average differences in ability between groups increased, systematic, random, and total equating error increased, but systematic error was affected more than random error. Cubic spline postsmoothing resulted in the smallest values for systematic error regardless of the extent of group differences and equating methods used. Results for total equating error aligned with those for systematic error. When frequency estimation or chained equipercentile equating were used, presmoothing outperformed postsmoothing with the group differences of.05,.2, and.5. When the group difference equaled -.2, postsmoothing produced the smallest total error regardless of the equating method used. Within the chained equipercentile method, kernel equating outperformed both presmoothing and postsmoothing. However, with modified frequency estimation, postsmoothing outperformed presmoothing with group differences of.05 and.2 but underperformed presmoothing with a group difference of.5. Proportion of Common Items Table A11 shows that as the proportion of common items increased, weighted absolute bias decreased for all equating methods with or without smoothing applied. The rate of decrease was highest for chained equipercentile, followed by modified frequency estimation and frequency estimation. Weighted standard errors (see Table A12) also decreased when the proportion of common items was doubled from 20% to 40%, but the magnitude of reduction was smaller than for systematic error. Consistent with the reductions in both systematic and random, total equating error decreased with increases in the proportion of common items (see Table A13). From Figure 4.5, it is evident that cubic spline postsmoothing performed best when 20% of items were common (50% of the conditions; see upper half of the figure),

81 69 whereas log-linear presmoothing performed best when the proportion of common items doubled (43% of the cases, see lower half of the figure). When the chained equipercentile method was used, kernel equating outperformed both presmoothing and postsmoothing in a majority of cases regardless of the proportion of common items used. Frequency Estimation Equipercentile Equating When frequency estimation was used (see Table A11), weighted absolute bias decreased with all smoothing methods when 20% of the items were common between forms, but increased with postsmoothing with 40% common items. Overall, log-linear presmoothing performed better than postsmoothing with both common item proportions. Smoothing combined with frequency estimation decreased weighted standard errors in all cases with proportions of reduction ranging from 34% to 43% (see Table A12). These reductions were smaller than for absolute bias and slightly lower when more common items were used. Regardless of the proportion of common items used, cubic spline postsmoothing produced the smallest random error. Total equating error also was reduced under all conditions when smoothing was combined with frequency estimation with reductions ranging from 4% to 12% (see Table A13). Postsmoothing outperformed presmoothing with 20% common items, but the reverse was true with 40% common items. Modified Frequency Estimation Equipercentile Equating For both frequency estimation and modified frequency estimation, weighted absolute bias decreased with smoothing in all instances except when applying cubic spline postsmoothing with 40% common items. Among the smoothing methods, postsmoothing performed better when 20% of the items were common, whereas presmoothing performed better with 40% common items. Smoothing decreased weighted standard errors in every case with proportions of reduction varying from 30% to 42% and somewhat greater reductions observed when the

82 70 number of common items was doubled. Regardless of the proportion of common items, cubic spline postsmoothing always outperformed log-linear presmoothing. Table A13 shows that total equating error decreased with either presmoothing or postsmoothing in all situations in which modified frequency estimation was used with reductions varying from 6% to 15% depending on the proportion of common items and the smoothing method used. With larger number of common items, the effect of smoothing was greater. Because the relative amount of systematic error was larger than that of random error, the pattern of results for total error was similar to systematic error with postsmoothing providing lesser total error with 20% common items and presmoothing providing lesser error with 40% common items. Chained Equipercentile Equating When the chained equipercentile method was used with 20% common items, weighted absolute bias increased only when postsmoothing was implemented. Kernel combined with chained equipercentile equating produced the smallest weighted absolute bias with 20% common items. However, with 40% common items, systematic error increased with any type of smoothing applied, thereby resulting in unsmoothed methods producing the smallest weighted absolute bias. When smoothing was combined with chained equipercentile equating, weighted standard errors were reduced by 33% to 46% with greater reductions with larger numbers of common items. For both common-item proportions, cubic spline postsmoothing provided the smallest random error. Smoothing also reduced weighted root mean square errors in all cases with reductions ranging from 9% to 26%. These reductions were of greater magnitude with 40% common items than with 20%. With 20% common items, cubic spline postsmoothing produced the smallest total error, whereas with 40% common items, kernel combined with chained equipercentile equating produced the smallest.

83 71 Summary Doubling the proportion of common items reduced weighted absolute bias, weighted standard error, and weighted root mean square error. The decrease was more noticeable with weighted absolute error than with weighted standard error, indicating that the difference in common-item proportions had more impact on systematic error than on random error in equating. Cubic spline postsmoothing produced the smallest random errors in every situation and smallest systematic and total errors when 20% of the items were common. Kernel combined with chained equipercentile equating produced the smallest total error when 40% common-item sets were used. Form Difference Tables A14 through A16 have aggregated summary statistics for different degrees of form differences. Weighted absolute bias approximately doubled in all situations when the form difference increased from.05 to.2. Weighted standard errors also increased but negligibly so with larger form differences. Consistent with the large increase in weighted absolute bias, weighted root mean square errors also increased as form differences increased. Figure 4.6 shows that when old and new forms were similar in average difficulty (i.e., form difference equaling.05; upper half in the figure), cubic spline postsmoothing yielded lower total equating error than did log-linear presmoothing for 56% of the conditions. In contrast, log-linear presmoothing produced the smallest total equating error in 48% of the cases when the form difficulty difference was.2 (lower half in the figure). Regardless of form differences, kernel combined with chained equipercentile equating produced the smallest total equating error in more instances than did either presmoothing or postsmoothing combined with chained equipercentile equating.

84 72 Frequency Estimation Equipercentile Equating Table A14 shows that weighted absolute bias decreased with smoothing except when postsmoothing was applied to a form difference of.2. Log-linear presmoothing outperformed cubic spline postsmoothing in all conditions for both form differences. Weighted standard error was reduced with smoothing in all instances with proportions of reduction ranging from 36% to 45%. Reductions were larger when forms were more similar in the average difficulty, and cubic spline postsmoothing always produced the smallest weighted standard errors. Total equating error also was reduced in all cases in which smoothing was applied with reductions ranging from 4% to 11%. Reductions with smoothing were larger when form difficulties were close to each other with postsmoothing performing better with a form difference equaled.05 and presmoothing performing better with a form difference of.2. Modified Frequency Estimation Equipercentile Equating With modified frequency estimation, weighted absolute bias increased only with postsmoothing when the form difference was.2. In all other cases, systematic error declined with smoothing. Cubic spline postsmoothing displayed the smallest weighted absolute error for a form difference of.05, whereas presmoothing displayed the smallest for a form difference of.2. Weighted standard errors were reduced with any type of smoothing with reductions ranging from 33% to 45%. Applying smoothing was more effective when the old and new forms were more similar in average difficulty. Regardless of the form differences, cubic spline postsmoothing resulted in the smallest random error when modified frequency estimation was used. Weighted root mean square errors also were reduced with smoothing in every situation with reductions varying from 5% to 21%. Postsmoothing performed better than

85 73 presmoothing with a form difference of.05, and presmoothing performed better with a form difference of.2. Chained Equipercentile Equating When the chained equipercentile equating was used, weighted absolute bias increased with both presmoothing and postsmoothing but diminished for both form differences when kernel equating was used. Weighted standard errors were reduced by 37% to 47% with smoothing when chained equipercentile was used with greater reductions for similar form difficulties. Among the smoothing methods, cubic spline postsmoothing always produced the smallest amount of random error. Smoothing reduced weighted root mean square error in all cases by 9% to 25% with larger reductions observed for similar form difficulties. Cubic spline postsmoothing produced the smallest weighted root mean square error when the forms differed by.05 in average difficulties, whereas kernel combined chained equipercentile equating performed the best when the forms differed by.2. Summary Increases in the form difficulty differences resulted in increases in weighted absolute bias, weighted standard error, and weighted root mean square error. The rate of increase for weighted absolute bias was greater than that for weighted standard error indicating that difference in form difficulty had a larger impact on systematic error than on random error in equating. The relative performance of smoothing methods in terms of weighted absolute bias depended more on the equating method than on the form difference. For frequency estimation, log-linear presmoothing produced the smallest weighted absolute bias. For modified frequency estimation, postsmoothing yielded lesser bias when forms were dissimilar. For the chained equipercentile method, kernel equating yielded lesser bias

86 74 than both presmoothing and postsmoothing. Cubic spline postsmoothing resulted in the smallest weighted standard error values regardless of the equating method and form differences considered. Weighted root mean square error differed in accordance with the form differences with cubic spline postsmoothing providing smaller total error than loglinear presmoothing with a form difference of.05, and log-linear presmoothing provided smaller total error with a form difference of.2. Kernel chained equipercentile equating produced the smallest weighted root mean square error under most conditions when forms were dissimilar in difficulty. Common Item Type Tables A17 through A19 show aggregated summary statistics for internal versus external common-item sets. In general, weighted absolute bias, weighted standard errors, and weighted root mean square errors were smaller when external common items were used, and these effects were more pronounced for absolute bias than for standard errors. Among the equating methods, frequency estimation equipercentile equating produced more similar results for internal and external common-item sets. Figure 4.7 shows the pattern of best-performing smoothing methods for total equating error with the upper half representing internal common-item sets and the lower half representing external common-item sets. Cubic spline postsmoothing produced the smallest weighted root mean square error in most cases for both common item types (43% when the common items were internal, and 41% when the common items were external). In general, the best-performing smoothing methods were quite similar for internal and external common-item sets. As has typically been the case, kernel equating resulted in the smallest weighted root mean square error when the chained equipercentile method was used.

87 75 Frequency Estimation Equipercentile Equating Table A17 shows that; for frequency estimation and both types of common-item sets; weighted absolute bias increased whenever postsmoothing was used and decreased whenever log-linear presmoothing was used. Smoothing of any type reduced random error with reductions ranging from 31% and 51% but this effect was more apparent with external common items. Overall, cubic spline postsmoothing resulted in the smallest weighted standard error. Weighted root mean square errors also were reduced whenever smoothing was performed with reductions ranging from 6% to 8%, and these effects did not differ much between the internal versus external common-item sets. However, the best-performaing smoothing method varied for the two common item types with cubic spline postsmoothing outperforming log-linear presmoothing with internal common items, and log-linear presmoothing outperforming cubic spline postsmoothing with external common items. Modified Frequency Estimation Equipercentile Equating With modified frequency estimation, weighted absolute bias was reduced with smoothing, except when cubic spline postsmoothing was used with external common items. Weighted standard errors always were reduced by 27% to 49% with smoothing combined with modified frequency estimation. As with frequency estimation, cubic spline postsmoothing resulted in the smallest weighted standard error regardless of the common item type. Weighted root mean square error also was reduced with both presmoothing and postsmoothing by 7% to 18%, and these reductions were larger with external common items. Smoothing methods providing the smallest weighted root mean square error were identical to those for frequency estimation with cubic spline postsmoothing doing best for internal common items and log-linear presmoothing doing best for external common items.

88 76 Chained Equipercentile Equating With the chained equipercentile method, systematic error increased with both presmoothing and postsmoothing when the common items were external, but decreased with kernel equating for both types of common items. Weighted standard error was reduced with any type of smoothing for both internal and external common items. Reductions ranged from 33% to 54% and were generally greater with external commonitem sets. For both common-item types, cubic spline postsmoothing resulted in the smallest weighted standard errors. Total equating error always decreased with smoothing with reductions ranging from 10% to 26%. These reductions again were larger when the common items were external. Cubic spline postsmoothing produced the lowest total error with internal common items, whereas kernel equating produced the lowest with external. Summary In general, weighted absolute bias, weighted standard error, and weighted root mean square error were smaller when external common items were used. The difference in the values was larger for weighted absolute bias than for weighted standard error suggesting that the use of different common item types has a larger effect on systematic error than on random error in equating. Weighted absolute bias was affected more by equating methods than by different common item types. For example, log-linear presmoothing outperformed other methods when frequency estimation was used, and kernel equating performed the best when chained equipercentile method was used. However, with modified frequency estimation, postsmoothing performed best with internal common items, whereas presmoothing performed best with external common items. Cubic spline postsmoothing yielded the smallest weighted standard error regardless of the common item types or equating methods used. Smoothing methods producing the smallest weighted root mean square error varied with common item types. With internal common-item sets, postsmoothing

89 77 performed better than presmoothing for all three equating methods investigated. With external common-item sets, presmoothing performed best for the frequency estimation methods, and kernel equating performed best for the chained equipercentile method. Spread of Difficulty Parameters For every equating method used, weighted absolute bias and weighted root mean square errors were lower and weighted standard errors were slightly higher for midi than for mini common-item sets (see Tables A20 to A22). Figure 4.8 shows the bestperforming smoothing methods for weighted root mean square errors under mini versus midi conditions. The upper and lower halves of the figure show results for mini and midi common-item sets, respectively. For mini sets, postsmoothing outperformed presmoothing in 50% of the conditions, whereas for midi common-item sets, presmoothing outperformed postsmoothing in 45% of the conditions. For both sets, kernel equating yielded the lowest total equating error under most conditions when the chained equipercentile method was used. Frequency Estimation Equipercentile Equating Table A20 shows that weighted absolute bias increased whenever cubic spline postsmoothing was combined with frequency estimation. The relative performance of the smoothing methods did not vary between midi and mini common-item sets with presmoothing producing the smallest weighted absolute bias. Weighted standard errors were reduced by any type of smoothing with reductions ranging from 36% to 44%. These reductions were larger for mini common-item sets, and cubic spline postsmoothing consistently produced the lowest weighted standard errors. Smoothing reduced total equating error by 6% to 8%, and these reductions did not differ much between mini- versus midi-common items. Postsmoothing outperformed presmoothing with mini-common-item sets; whereas presmoothing outperformed postsmoothing with midi-common-item sets.

90 78 Modified Frequency Estimation Equipercentile Equating With modified frequency estimation, weighted absolute bias increased only when postsmoothing was used with mini common-item sets. Log-linear presmoothing produced the smallest weighted absolute bias with mini-common-item sets, and cubic spline postsmoothing provided the lowest with midi-common-item sets. Smoothing reduced random errors by 33% to 43% with slightly better results observed for midi- than for mini-sets. Overall, cubic spline postsmoothing produced the smallest random error. Smoothing reduced total equating error for both mini and midi common items by 10% to 12%. Presmoothing yielded better results with mini-common-item sets, and postsmoothing yielded better results with midi-common-item sets. Chained Equipercentile Equating For the chained equipercentile method, weighted absolute bias increased with both log-linear presmoothing and cubic spline postsmoothing, but decreased with kernel equating. Smoothing reduced weighted standard errors by 37% to 46%, and these effects were stronger for mini common-item sets. For both mini- and midi- common item conditions, cubic spline postsmoothing resulted in the smallest weighted standard errors. Smoothing reduced weighted root mean square errors by 13% to 17%, and these reductions were slightly larger when midi common items were used. Postsmoothing produced the smallest total equating error with mini-common-item sets, and kernel equating produced the smallest with midi-common-item-sets. Summary Overall, midi-common-item sets provided smaller weighted absolute bias and weighted root mean square error and slightly larger weighted standard error than did mini-common-item sets. The difference in the weighted absolute bias was larger than

91 79 weighted standard errors, indicating that differences in the spread of difficulties for common-item sets affected systematic error more than random error. Weighted absolute bias for log-linear presmoothing was smaller than that for cubic spline postsmoothing when frequency estimation was used. For modified frequency estimation, presmoothing performed best with mini-common-item sets, and postsmoothing performed best with midi-common-item sets. For the chained equipercentile method, kernel equating outperformed both presmoothing and postsmoothing regardless of the spread of difficulty parameters in the common items. In terms of weighted standard error, cubic spline postsmoothing combined with all three equating methods outperformed other equating/smoothing procedures for both mini- and midi-common-item conditions. For weighted root mean square error and mini-common-item sets, postsmoothing produced the smallest values for frequency estimation and chained equipercentile methods, and presmoothing provided the best for modified frequency estimation. For midi-commonitem sets, log-linear presmoothing was best for frequency estimation, cubic spline postsmoothing was best for modified frequency estimation, and kernel equating was best for the chained equipercentile method. Smoothing Parameter Table A23 shows the aggregated summary statistics for different smoothing parameters. When kernel equating was applied (both kernel frequency estimation and kernel chained equipercentile equating), the relative performance for different smoothing parameters was similar to situations where log-linear presmoothing was paired with unsmoothed equating methods. The trend in the amount of weighted absolute bias, weighted standard error, and weighted root mean square error for different smoothing parameters was identical for frequency estimation, modified frequency estimation, and chained equipercentile equating.

92 80 With cubic spline postsmoothing, weighted absolute bias was the smallest when S.1, and increased as S values increased (i.e., with more smoothing). With log-linear presmoothing, weighted absolute bias was generally smaller when a larger number of parameters were preserved (i.e., less smoothing) either in the marginal distributions or the bivariate distribution. Consequently, presmoothing produced the smallest systematic error when the smoothing parameters equaled (6, 6, 3). Regardless of the equating method used, weighted standard errors decreased with increases in cubic spline postsmoothing smoothing parameter S and therefore were smallest when S.5. With polynomial log-linear presmoothing, weighted standard errors were smaller when more parameters were preserved in the marginal distributions and fewer parameters preserved in the bivariate distribution with the number in the marginal distributions having heavier weight. Hence, presmoothing with parameters (4, 4, 1) resulted in the smallest weighted standard error. Weighted root mean square errors decreased when the postsmoothing parameter S increased, when fewer parameters were preserved in the marginal distributions, and when more parameters were preserved in the bivariate distribution. Consequently, presmoothing with parameters (4, 4, 3) produced the smallest total error. Equating Method In Table A24, weighted absolute bias, weighted standard error, and weighted root mean square error are reported for the three equating methods examined. When smoothing methods were fixed, frequency estimation yielded the largest weighted absolute bias and weighted root mean square errors, and chained equipercentile equating showed the smallest. Weighted standard errors displayed an entirely different pattern with chained equipercentile producing the largest values and frequency estimation producing the smallest values. These differences reflect the greater effect that equating methods have on systematic error as compared to random error.

93 81 Figure 4.4 highlights smoothing methods producing the smallest amount of total equating error. The first three columns representing frequency estimation show that the number of conditions in which log-linear presmoothing and cubic spline postsmoothing performed the best were quite similar (46% and 45% of the cases, respectively). The next three columns for modified frequency estimation show that log-linear presmoothing outperformed cubic spline postsmoothing for 58% of the conditions. The pattern for the chained equipercentile method in the last three columns indicate that kernel equating produced the smallest total error for 52% of the cases, followed by postsmoothing (39%) and presmoothing (9%). Table A24 shows that weighted absolute bias increased relative to the unsmoothed methods whenever postsmoothing was implemented. However, with loglinear presmoothing, weighted absolute bias increased only with modified frequency estimation and chained equipercentile equating. With frequency estimation, log-linear presmoothing produced the smallest amount of systematic error; with modified frequency estimation, the unsmoothed method provided the smallest; and with the chained equipercentile method, kernel equating provided the smallest. For weighted standard errors (see Table A24), values were lowest when postsmoothing was paired with any type of equating method. In terms of total equating error, kernel equating provided the smallest values with the frequency estimation and chained equipercentile methods, whereas postsmoothing produced the smallest value with modified frequency estimation. Further Explorations into Equating Error In this section, the results for selected additional analyses are given. The first analysis is focused on equating error in relation to each of the 192 simulated test conditions rather than aggregated indices. Maximum and minimum absolute differences in equating error between smoothing methods for each equating procedure are shown

94 82 graphically and discussed. The second additional analysis provides an illustration of equating error conditional on test score points. Absolute Difference in Equating Error In many situations, the relative performances of the procedures were determined based on very small differences in the summary statistics. To have a better understanding of the differences in the performance of smoothing methods, an additional analysis on the absolute differences in equating error was conducted. For each of the 192 conditions considered in this study, the difference between the maximum and minimum summary statistics (i.e., range) among various smoothing methods was computed and plotted for each equating method in Figure 4.9. The X-axis in the figure represents the 192 simulation conditions including: three levels of sample size (300, 2,000, and 6,000); four levels of group difference (.05,.2,.5, and -.2); two levels of spread of difficulty parameters (mini and midi); two types of common items (internal and external); two levels of proportion of common items (20% and 40%); and two levels of form differences (.05 and.2). The different colors represent different equating methods: blue for frequency estimation, orange for modified frequency estimation, and gray for chained equipercentile equating. For example, the very first blue data point on the left-hand side represents the difference value (or range) for frequency estimation when the simulated condition involve a sample size of 300, a group difference of.05, a mini/internal common-item set with 20% of the items being common, and a form difference of.05. The data point right next to that represent the same testing condition except for the form difference of.2. By following the ordering scheme of the different factor-level combinations given above, the first one-third of the plot represents the data for the sample size of 300, the next one-third

95 83 for the sample size of 2,000, and the last one-third for the sample size of 6,000 (separated by vertical lines shown in Figure 4.9). The first panel in Figure 4.9 shows the difference ranges in weighted absolute bias for each condition. The difference range in weighted absolute bias for choosing a different smoothing method varied from.002 to.43. Under some conditions, the difference range in the amount of bias was quite large with modified frequency estimation represented by orange data points. As sample size increased from 300 to 6,000, the difference range in weighted absolute bias tended to decrease. With a sample size of 6,000, the selection of smoothing generally did not have a crucial impact on the amount of systematic error. The middle panel in Figure 4.9 displays the difference ranges in weighted standard errors under each testing condition examined. Larger sample sizes were associated with smaller differences in the random error. For all 192 conditions considered, the difference ranges were less than or equal to.05 when N 6,000 less than.1 when N 2,000. Across equating methods used, chained equipercentile tended to produce larger difference ranges for random error among the smoothing, and methods applied. Another pattern evident for random error is when sample size equals 300, as represented by the first third of the data points in Figure 4.9. Within this region, the first quarter of the data points shows group differences equaling.05 and subsequent quarters show group differences of.2,.5, and -.2, respectively. This pattern reveals when groups that have large differences in proficiency, the selection of a smoothing method has a substantial impact on the amount of random error, especially when small samples are used. Lastly, the bottom panel in Figure 4.9 provides the difference ranges in the weighted root mean square errors for varying smoothing methods. The differences ranged approximately from.01 to.4. Since both weighted absolute bias and weighted standard errors showed smaller differences among the smoothing methods when sample size

96 84 increased, weighted root mean squares also displayed smaller difference ranges when the sample size increased. The selection of smoothing methods did not have a major influence on the difference in total error among smoothing methods if the sample size was quite large except for some conditions that displayed erratically large differences with modified frequency estimation. Those data points were identified as conditions involving midi/internal common-item sets with form differences of.05. However, the reason why those conditions produced larger differences among the performance of smoothing methods is unclear. Equating Error Conditional on Score Points In this section, equating results for two of the 192 simulated conditions are presented to determine the extent to which bias, standard errors, and root mean square errors varied along the score scale, and the smoothing method that worked best at different places on the score scale. These two conditions were chosen because they were expected to show the most contracting results at different score points. Two testing situations were chosen to demonstrate the importance of examining the conditional summary statistics. These situations represent the same conditions except for the spread of difficulty in the internal common item sets (mini versus midi). In both figures, the sample size equals 2,000, group difference equals.2, 20% of the items are common items, and form difference is.05. Conditional bias, standard errors, and root mean square errors are shown respectively in the first, second, and third rows for miniitem sets in Figure 4.10 and for midi-item sets in Figure The figures represent seven or five equating/smoothing procedures depending upon the equating method used. The procedures were chosen to represent the different smoothing parameters for each smoothing method. For frequency estimation and chained equipercentile methods, conditional summary statistics for seven procedures are displayed: unsmoothed, postsmoothing with S.1, postsmoothing with S.3,

97 85 presmoothing with C (4, 4, 3), presmoothing with C (6,6,3), kernel equating with C (4,4,3), and kernel equating with C (6,6,3) methods were provided. Because kernel equating was not applied to the modified frequency estimation method, only five methods are shown: unsmoothed, postsmoothing with S.1, postsmoothing with S.3, presmoothing with C (4, 4, 3), and presmoothing with C (6,6,3). In both figures, results for frequency estimation and chained equipercentile equating with log-linear presmoothing were almost indistinguishable from results for kernel frequency estimation and kernel chained equipercentile equating when identical sets of smoothing parameters were applied. Mini Common Items Figure 4.10 shows mini common-item sets were used to equate two forms, bias showed wavy patterns regardless of the smoothing method applied (see Figure 4.3). The values of bias varied across the raw score points. For some score points, bias increased with smoothing, while for other score points bias decreased. The best methods varied with points along the raw score scale, and the conditional bias plots did not pinpoint a single best-performing smoothing method. With conditional standard errors, a more clear-cut picture emerged for the most effective equating methods. For all methods, conditional standard errors were decreased compared to no smoothing at every raw score when smoothing was used. Cubic spline postsmoothing with a smoothing parameter of.3 resulted in the smallest standard error with an exception at the low extreme when frequency estimation was used. In this situation, presmoothing with (4, 4, 3) produced the smallest standard errors. Root mean square errors were decreased with smoothing for a majority of score points. For some score points, total equating errors were larger than those for no smoothing, and this was especially true at the lower and/or higher end of the raw score scale depending on the equating method used. Cubic spline postsmoothing with a

98 86 smoothing parameter of.3 produced smaller total equating error for the mid-range score points but larger total error at lower and higher extremes compared to other smoothing methods. Midi Common Items Figure 4.11 reveals that with the use of midi common items, bias approximates a unimodal symmetric distribution. The shape of the curve was consistent even when different equating methods were used. Around the score point of 38, bias equaled 0 (corresponding to the horizontal line in the figure). With frequency estimation and modified frequency estimation, cubic spline postsmoothing with S.3 produced the smallest systematic error for a majority of score points, whereas the performances of presmoothing and postsmoothing were fairly comparable with chained equipercentile equating. With either cubic spline postsmoothing or polynomial log-linear presmoothing, standard errors decreased at all score points examined. When frequency estimation and chained equipercentile equating was used, log-linear presmoothing with (4, 4, 3) displayed the smallest values for most score points, whereas cubic spline postsmoothing with a smoothing parameter of.3 did so for a majority of score points with modified frequency estimation. Conditional root mean square error displayed identical patterns resembling a bimodal distribution regardless of the equating methods used. Total equating error showed a dip around the score point of 38. Cubic spline postsmoothing with S.3 performed best for a majority of score points, whereas presmoothing with (6, 6, 3) did best at the extremes when frequency estimation and chained equipercentile were used. In examining these scenarios for mini and midi common-item sets, it is evident that no single smoothing method outperformed other methods for the full range of the score scale. Although only two conditions were examined, they serve to illustrate that

99 87 aggregated indices should be interpreted with caution, especially when test use is focused on accuracy at specific score points. Chapter Summary Key findings discussed in this chapter can be summarized as follows: 1. Differences in the weighted absolute bias, weighted standard error, and weighted root mean square error depended more heavily on equating methods than on smoothing methods. 2. The effects of the investigated factors were different for different types of equating errors. Sample size had a larger effect on random error, whereas effect size, proportion of common items, form difference, common item type, and spread of difficulty parameters had a larger impact on systematic error. 3. In general, application of smoothing showed potential in reducing total equating error relative to the unsmoothed equating procedures regardless of the testing condition considered. However, differences between unsmoothed and smoothed methods tended to be smaller (i.e., the effect of smoothing was less obvious) with larger sample sizes, larger group differences, smaller proportion of common items, larger form differences, internal common items, and mini common items. 4. Log-linear presmoothing produced more accurate equating relationships than cubic spline postsmoothing under a majority of testing conditions in terms of total equating error: (a) when sample sizes were larger than 2,000, (b) when the more difficult forms were administered to the groups with higher proficiencies, (c) when 40% of the items were common in the old and new forms, (d) when the old and new form difference was large (.2), and (e) when midi common-item sets were used. 5. Cubic spline postsmoothing outperformed log-linear presmoothing under a majority of testing conditions in total equating error: (a) when the sample size

100 88 equaled 300, (b) when the more difficult form was administered to less proficient examinees, (c) when 20% of the items were common in the old and new forms, (d) when the old and new forms were quite similar in their average difficulties (a difference of.05), and (e) when mini common-item sets were used. 6. Kernel equating produced the least total equating error under a majority of testing conditions when paired with chained equipercentile equating. In such cases, kernel chained equipercentile equating showed smaller total equating error than either log-linear presmoothing or cubic spline postsmoothing combined with chained equipercentile equating. Kernel chained equipercentile equating produced more accurate equating relationships when sample sizes were larger than 2, With cubic spline postsmoothing, total error decreased as the smoothing parameter S increased. With log-linear presmoothing, total error declined as the number of parameters preserved in the marginal distributions decreased. 8. In comparing different equating methods, chained equipercentile equating produced the smallest amount of total error, followed by modified frequency estimation and frequency estimation regardless of the smoothing methods used.

101 89 CHAPTER FIVE DISCUSSION AND CONCLUSION The main purpose of the current study was to compare the performance of different smoothing methods under the common item nonequivalent groups design. A simulation study was conducted using 192 conditions, and the results were provided in the previous chapter according to the main factors of interest. In this chapter, these findings are first described in relation to original research questions posed. Then, limitations and future research are discussed. Finally, conclusions and implications are addressed. Research Question 1 1. How do variation of sample size, group difference, proportion of common items, form difference, common-item type (i.e., internal or external), spread of the difficulty in common items affect equating errors (i.e., bias, standard error, and root mean square error)? The first main research question dealt with the effect of each main factor on equating error. In general, as sample size increased, systematic, random, and total errors decreased, with decreases in total error due predominantly to decreases in random error. Previous researchers also report similar results (e.g., Cope & Kolen, 1990; Cui, 2006; Hanson, 1990). This result is not surprising because increases in sample size are known to reduce random error substantially (Cui, 2006). Effects of group differences on equating error varied with the equating method used. Consistent with previous research, chained equipercentile equating was less affected by group differences than was frequency estimation, thereby making chained equipercemtile equating the method of choice with large group differences (Hagge & Kolen, 2011; Harris & Kolen, 1990; Hou, 2007; Kim & Lee, 2009; Kolen, 1990; Lawrence & Dorans, 1990; Liu et al., 2011; Livingston et al., 1990; Marco et al., 1983;

102 90 Powers & Kolen, 2011; Schmitt, Cook, Dorans, & Eignor, 1990; Sinharay & Holland, 2006; Sinharay & Holland, 2007; Skaggs & Lissitz, 1986; von Davier et al., 2004; Wang et al., 2008). Wang et al. (2008) suggested that the strong assumption of conditional distributions being equal within the two populations for frequency estimation is responsible for the difference in equating error across these methods. As the proportion of common items increased, systematic, random, and total equating errors all decreased, but the decline was much more evident with systematic error than with random error. These results replicate findings from several studies in which bias increased with decreases in the relative length of the common-item set (see, e.g., Holland et al., 2008; Hou, 2007; Petersen et al., 1983; Ricker & von Davier, 2007; Wang et al., 2008). Hou (2007) attributed results of this nature to increasing reliability between the common item scores and the total test scores as common item ratios increase. As test forms differed more in average difficulty, systematic, random, and total errors all increased, but the increase for systematic error was much larger than that for random error. Unlike the results given in this dissertation, Hou (2007) and Sinharay and Holland (2006) found that form difficulty differences had almost no effect on systematic, random, and total errors. This difference probably arises from the interacting effects with other factors included in this study. Whenever 20% internal common-item sets were used with a large form difference of.2, bias became strikingly larger compared to other conditions. Therefore, by averaging conditions having form differences of.2, aggregated bias spiked compared to that for form difficulties being similar. Kolen and Brennan (2004) noted that the assumptions required for the common item nonequivalent groups design are strong, and higher equating accuracy might not be achieved when forms differ substantially in difficulty. When the common items were external, systematic, random, and total equating error were smaller than when the common items were internal, and this was especially

103 91 true for systematic error. Internal versus external common-item sets have never been dealt with in the equating literature. This finding may have been due to differences in the difficulties of uncommon (i.e., unique) items in the old and new forms. To elaborate, consider a situation in which the average difficulties of the old and new forms equal 0 and.2, respectively (see Figure B12). Because the common items are common to both forms, the average difficulty of the common-item sets is always equal to 0 (i.e., equal to the average difficulty of the old form, Form Y). For internal common items, the average difficulty of the unique items has to be larger than.2 to satisfy the.2-average of the new form, Form X. For external common items, the average difficulty of the unique items still equals.2 because the common items are not part of the whole form (Form X). As a result, the degree of difficulty adjustment that is to be made by equating through a common-item set is always more severe with internal common-item sets than with external ones. This is inevitable because the common items will be common in the old and new forms in any situation where the common item nonequivalent groups design is used. Hence, when the forms are quite different in average difficulty, equating with an external common-item set would produce smaller bias and total equating error than with an internal common-item set. There also are two other issues pertaining to the use of internal versus external common-item sets (Kolen & Brennan, 2004). First, for internal sets, context effects are more problematic than for external sets which are usually administered in a separately timed section. Second, the issue of structural zeroes only arises with the use of internal sets, and specifically when frequency estimation is combined with loglinear presmoothing. For the bivariate distribution of the internal common-item scores and total scores, some pairs of scores are impossible to occur for example, the common-item scores cannot be bigger than the total scores. When the log-linear bivariate presmoothing is fitted to the observed bivariate distribution, the impossible pairs of scores (i.e., structural zeroes) will be given some positive probabilities, albeit

104 92 usually very small. In this dissertation, the approach taken by Brennan et al. (2009) was employed to deal with this issue, which, roughly speaking, assigns zero probabilities to the structural zero cells, but in so doing loses the moment preservation property of the log-linear presmoothing to some extent. Use of an external commonitem set does not have this issue. A more detailed discussion on this issue can be found in Kim (in progress). Differences in the spread of difficulty parameters in the common-item sets also had effects on equating error. With only medium-difficulty items were included in the common-item sets (i.e., midi common items), systematic and total errors were smaller. When mini-common-item sets were used, random error was slightly smaller. This finding is consistent with the results from Sinharay and Holland (2006, 2007). Liu et al. (2011) explained this relationship by noting that the equating errors are different due to; (a) the mini sets attenuating the real group differences and (b) the content specifications being satisfied better with midi sets. However, it should be noted that the use of midi common items might not be practical in some testing situations when content-representation of the common-item sets requires a greater spread of item difficulties. Research Question 2 2. How do smoothing methods within the equating methods (i.e., FE, MFE, and CE) compare in equating error when the examinees vary? The second main research question was focused on the performance of smoothing methods in relation to differences in examinees. Two aspects of examinees were considered: sample size and group mean ability differences as reflected in the subquestions to follow To what extent does the performance of smoothing methods vary depending on the sample size?

105 93 With a sample size of 300, cubic spline postsmoothing produced smaller total equating error than did log-linear presmoothing for most conditions. Antal et al. (2011) and Liu and Kolen (2011) also reported smaller total error with cubic spline postsmoothing than with log-linear presmoothing with sample sizes 2,000 or less. In relation to small samples, Colton (1995) concluded that with 400 or fewer examinees, no equating (identity method) might be preferred. Had no equating been a consideration in this study, Colton s results might have been supported. For a sample size of 6,000, loglinear presmoothing yielded smaller total error than did cubic spline postsmoothing for a majority of conditions. This result also is consistent with those from Antal et al. (2011), Hanson (1990), and Hanson et al. (1994). However, in Cui (2006), cubic spline postsmoothing produced the smallest total error regardless of sample sizes used. One possible explanation for the inconsistency found in these results may be that the effects were confounded with other factors not considered. Overall, the performance of different smoothing methods became more similar to each other as sample size increased. This was evident not only with aggregated summary statistics (Tables A5 through A7), but also with the difference range plots (Figure 4.9). With fairly large samples, the selection of the smoothing method did not lead to considerable differences in the amount of total equating error. The difference between the unsmoothed and smoothed methods also became smaller with larger sample sizes, indicating that performing smoothing with relatively large samples was not as effective as it was with smaller sample sizes. Kolen and Brennan (2004) noted that random error becomes smaller with enlarged sample sizes, and even inconsequential for very large samples. Hence, the random errors being fairly small even with the unsmoothed methods probably led to less effectiveness from smoothing. These results are consistent with findings from many previous studies showing that smoothing was most beneficial when equating was done with small samples (see, e.g., Cope & Kolen, 1990; Cui, 2006; Cui & Kolen, 2009; Hanson, 1990; Hanson, 1991; Hanson et al., 1994; Liu & Kolen, 2011;

106 94 Zeng, 1995). Nevertheless, even with the largest sample size (i.e., 6,000), smoothing did contribute some to obtaining a more accurate estimate of the equating relationship in the present study. This result may have occurred in part due to the use of aggregated summary statistics. Even if total error increased with smoothing with some conditions, averaging might have masked such occurrences. In general, equating error depended more on the overall equating method than on the smoothing method used. Regardless of the sample size and smoothing method, chained equipercentile equating produced the smallest total error. Antal et al. (2011) and Marco et al. (1983) found similar patterns. This result probably was a byproduct of the aggregated summary statistics. Frequency estimation is more susceptible to larger group differences, whereas chained equipercentile equaiting is not. Therefore, averaging conditions might have caused the errors to be smaller for chained equipercentile equating. Chained equipercentile outperformed other equating methods for other main factors as well, and the same argument applies to results for subsequent research questions To what extent does the performance of smoothing methods vary depending on the group mean differences (i.e., effect size)? As absolute group mean differences increased, log-linear presmoothing tended to perform better than cubic spline postsmoothing in terms of total error, except in a majority of conditions with a group difference of -.2. No previous studies included a condition where a less-able group was administered a more difficult form. However, Antal et al. (2011) reported that postsmoothing outperformed presmoothing with a group difference of 0 and presmoothing outperformed postsmoothing with groups differing by.5. The reasons for the presmoothing producing smaller total error with larger group differences are unclear. Differences in total equating error between the unsmoothed and smoothed methods declined as group mean differences increased, indicating that smoothing was better in all cases but less effective when groups varied a lot in average proficiency. This

107 95 pattern was more evident with frequency estimation than with chained equipercentile equating due to the rapid increase in systematic error (i.e., weighted absolute bias). These findings are important but warrant further investigation because this is the only study to date that included group difference as a factor while comparing presmoothing and postsmoothing methods. As with different sample sizes, effects of group differences were governed more by equating method than by smoothing method used. When the two groups had similar abilities (i.e., group differences equaling.05), modified frequency estimation produced the smallest total error independent of the smoothing method used. When group differences were larger than.2, chained equipercentile demonstrated the smallest total equating error, followed by modified frequency estimation and frequency estimation. These results support conclusions from previous research that chained equipercentile equating is more robust to group differences relative to frequency estimation (e.g., Hagge & Kolen, 2011; Harris & Kolen, 1990; Kim & Lee, 2009; Kolen, 1990; Lawrence & Dorans, 1990; Livingston et al., 1990; Marco et al., 1983; Powers et al., 2011; Powers & Kolen, 2011; Schmitt et al., 1990; Sinharay & Holland, 2007; Skaggs & Lissitz, 1986; von Davier et al., 2004; Wang et al., 2008). Wang et al. (2008) suggested that the difference originates from frequency estimation having a strong assumption of two populations having equal conditional distributions (see Chapter 2). In addition and consistent with Kolen (1990) and von Davier et al. (2004), the amount of equating error found here was similar when groups administered the old and new forms were similar in average ability. The mathematical proof to why the different equating methods give similar equating errors with equivalent groups can be found in von Davier et al. (2004). Taking findings for Research Question 2 as a whole, we can conclude that variations in examinees in terms of sample size and/or group differences did affect the performance of smoothing methods. Log-linear presmoothing performed well when sample size was 6,000, and group difference increased. Cubic spline postsmoothing

108 96 outperformed presmoothing when sample size equalled 300 and the group difference was -.2 (i.e., with relatively large difference in average proficiency and an easier form administered to a more-able group). Research Question 3 3. How do various smoothing methods within the equating methods (i.e., FE, MFE, and CE) compare in equating error when the test forms vary? The third main research question was focused on the effect of differences in test forms on the performance of smoothing methods. Four characteristics of difference in test forms were considered including: the proportion of common items, average form difficulty differences, common item type (i.e., internal or external), and spread of difficulty parameters in the common-item sets. These characteristics gave rise to the four subquestions that follow To what extent does the performance of smoothing methods vary depending on the proportion of common items? With 20% common items, cubic spline postsmoothing performed better than loglinear presmoothing with regard to total equating error; however, with 40% common items, the opposite was true. Similar patterns were observed in Antal et al. (2011). As the proportion of common items was doubled from 20% to 40%, the amount of error reduced with smoothed versus unsmoothed methods become more evident, and this tendency was most prominent with chained equipercentile equating, followed by modified frequency estimation and frequency estimation. Antal et al. (2011) included the proportion of common items to compare the relative performance of smoothing methods, but their results for unsmoothed methods were not provided. Hence, making a comparison with the present study is not possible To what extent does the performance of smoothing methods vary depending on the form difficulty differences?

109 97 When the old and new forms were fairly similar in average difficulty (i.e., form difference equaling.05), cubic spline postsmoothing yielded smaller total error under a greater number of conditions than did log-linear presmoothing, but as form difficulty differences increased to.2, the opposite was true. Similar patterns were found in Antal et al. (2011). As forms became more different in average difficulty, reductions in total error due to smoothing became smaller. With larger test form differences, the effectiveness of smoothing also did not vary much with the use of different equating methods. This is a relationship investigated only in the present study and worthy of confirmation in future studies To what extent does the performance of smoothing methods vary depending on the types of common items, i.e., internal or external? Regardless of the common item types, log-linear presmoothing yielded smaller total error than cubic spline postsmoothing under most conditions. The common-item sets being either internal or external did not have a direct influence on the relative performance of the smoothing methods. Also, the difference between the total error for the unsmoothed and smoothed methods was larger when the common items are external. These again are findings unique to the present study and worthy of further exploration To what extent does the performance of smoothing methods vary depending on spread of the difficulty item parameters in the common-item sets? With mini common-item sets, cubic spline postsmoothing performed better than log-linear presmoothing under a majority of conditions, but the opposite was true with midi common items. However, the effectiveness of smoothing did not differ much depending on the spread of difficulty parameters in common-item sets. The effect of mini versus midi sets on smoothing performances had not been studied in the past, so no supporting or opposing literature exists.

110 98 To summarize, differences in the old and new forms in terms of proportion of common items, average form difficulty, common item types, and the spread of difficulty parameters in the common-item set had specific effects on the performance of each smoothing method. Cubic spline postsmoothing performed better than log-linear presmoothing when the proportion of common items was relatively small (i.e., 20%), average form difficulties were similar (i.e.,.05), and common item set was a mini-version of the full forms. Log-linear premoothing tended to perform better than cubic spline postsmoothing with a larger proportion of common items (i.e., 40%), large average form difficulties (i.e.,.2), and midi common-item sets. Common items being internal or external did not have a substantial impact on the relative performance of the smoothing methods. Research Question 4 4. How do various smoothing methods affect equating error in conjunction with different equating methods and different smoothing parameters? The fourth main research question was broken into two sub-questions that focused on effects of smoothing methods within each equating method and how smoothing parameters affected the equating results To what extent does the performance of smoothing methods vary with the equating method used (i.e., FE, MFE, CE, and KE)? Log-linear presmoothing produced smaller total error than cubic spline postsmoothing when combined with frequency estimation and chained equipercentile equating. However, with modified frequency estimation, the opposite was true. Results confirm Antal et al. (2011) s recommendation that postsmoothing be paired with chained equipercentile equating. Kernel equating also performed particularly well when combined with the chained equipercentile method, partially supporting results given in Moses and Holland (2007) where the kernel equating method provided equivalent or slightly smaller

111 99 total error than chained equipercentile equating paired with log-linear presmoothing. The finding of postsmoothing outperforming presmoothing for modified frequency estimation is unique to the present study. The difference in the amount of total error between the unsmoothed and smoothed methods was largest with chained equipercentile equating, whereas frequency estimation showed the smallest difference. The complexity of the research design and the averaging of the aggregated summary statistics may account for this result. As mentioned in the previous section, chained equipercentile is less susceptible to larger group differences than is frequency estimation. Therefore, smoothing appears less effective with chained equipercentile equating than with frequency estimation due to the enlarged systematic error caused by larger group differences and aggregation across conditions To what extent does the performance of smoothing methods vary with different smoothing parameters? For cubic spline postsmoothing, total equating error decreased as smoothing parameters increased from.1 to.5 (i.e., with more smoothing). Although the amount of bias increased with more smoothing, on average, the amount of decreased standard error compensated for increased bias resulting in reduced total error. Similar results have been found by other researchers (e.g., Cui, 2006; Cui & Kolen, 2009; Liu & Kolen, 2011; Zeng, 1995). This finding probably emerged in the present study due to the range of the smoothing parameters used. If larger postsmoothing parameter S values such as.75 had been used, the averaged results might have shown that total equating error would not substantially decrease or even increase with postsmoothing applied (e.g., Liu & Kolen, 2011). With procedures involving log-linear presmoothing (including kernel equating), the number of parameters preserved in the marginal distributions had larger effects on the amount of total error than did the number of parameters preserved for the cross-product moments. Total error was smaller when the marginal distributions were smoothed more

112 100 strongly (assuming the number of preserved cross product moments in the bivariate distribution is fixed). These results are supported by previous studies conducted under the random groups design (e.g., Cui 2006; Cui & Kolen, 2009; Hanson et al., 1994; Liu & Kolen, 2011). When the number of preserved moments in the marginal distributions was fixed, total error became smaller with more cross products preserved in the bivariate distribution, (i.e., with less smoothing applied). Consequently, for the procedures involving log-linear presmoothing and kernel equating methods, the parameter set (4, 4, 3) produces the smallest total equating error. This result is inconsistent with Hanson (1991), where using the parameters (3, 3, 1) produced the most accurate estimate of the equating relationships under a majority of conditions than did parameters (4, 4, 3). Had the parameter set (3, 3, 1) been considered in the current study, the results might have been consistent with his. Limitations and Future Research The most obvious limitation of this study is that the content of the test forms was not a consideration. The constructed forms only statistically represented the testing conditions of interest including the characteristics of common items. For instance, the mini common-item sets satisfied being the mini version of the full forms regarding the average a, b, and c item parameters, and the range and standard deviation of the difficulty b parameters, but the content measured by each item was not considered when the common-item sets were constructed. Hence, considering the content-representation of the common items is desirable. For some content areas, all items could inevitably be extremely difficult because of the topics being covered. In such cases, the use of midi common items might not be realistic. By taking the content-representation in account, the findings will have more practical implications. Second, this study focused only on unrounded raw scores. However, in practice, rounded transformed scale scores are usually reported to users. Therefore, a replication of

113 101 this investigation with rounded scale scores would be necessary. Cui (2006) stated that even though it is not ensured, the best-performing smoothing methods for the unrounded raw scores under the random groups design are expected to correspond to the rounded scores. Third, this dissertation is solely based on a simulation study. Although item parameters estimated from real test responses were used to construct the fixed test forms, the findings would be stronger if corroborated with real data analyses. Fourth, due to the inclusion of so many other conditions, the number of smoothing parameters (or parameter sets) considered in this study was quite limited. This was especially true when log-linear presmoothing was combined with chained equipercentile equating, in which only two sets of parameters were examined because bivariate smoothing was not conducted. In practice, eight to nine possibilities are often investigated for each smoothing method. Hence, a future study might include varying forms with different item parameters for each replication and a more extensive set of smoothing parameters to more closely reflect operational equating situations. Fifth, only three sample size conditions (300, 2,000, and 6,000) were included with results varying more between sample sizes of 300 versus 2,000 than between samples sizes of 2,000 versus 6,000. In practical situations, sample sizes often fall between 300 and 2,000, making in-between values worthy of further exploration. Sixth, the simulated data in this study only included dichotomous responses. Future research on mixed-format test forms might be of interest to researchers and practitioners. By extending the study to mixed-format tests, the common-item conditions controlled in the simulation procedure will be more complex. Common-item sets could be constructed solely with dichotomous items, or with the mixture of dichotomous and polytomous items. Including such conditions would certainly provide the possibility of the application of the findings to a broader range of testing conditions.

114 102 Lastly, a more detailed study focusing on the characteristics of common-item sets is called for. For instance, an investigation using a more targeted design directly aimed at comparing internal and external common items could be conducted with emphasis on a possible interaction between the proportion of unique items and the difference in average form difficulties. Conclusions and Implications The purpose of this study was to compare the performance of various smoothing methods under varying testing conditions. Effects of various factors on the performance of smoothing methods was scrutinized in relation to bias, standard error, and root mean square error, which capture systematic, random, and total errors in equating, respectively. The most effective smoothing methods for all combinations of testing conditions considered were provided in Figures B1 to B8. The most important conclusions drawn from the results are as follows. 1. Polynomial log-linear presmoothing and cubic spline postsmoothing both showed promise in reducing total equating error under the common item nonequivalent groups design. With smoothing parameters producing the smallest amount of total error, there were always one or more smoothing methods that produced smaller total error than unsmoothed methods. 2. Overall, in terms of systematic and total equating error, polynomial log-linear presmoothing tended to perform better than cubic spline postsmoothing for the conditions investigated in this dissertation. 3. Cubic spline postsmoothing showed a strong tendency to produce the least amount of random error, particularly when the sample sizes and/or group differences were quite small. 4. With the chained equipercentile method, the use of kernel equating produced smaller systematic and total error than either presmoothing or postsmoothing in

115 103 most cases. Hence, depending on the equating method used, the implementation of either log-linear presmoothing or kernel equating is preferred. 5. Regardless of the equating/smoothing procedures used, the difference in equating error among various smoothing methods decreased when the sample size increased. Therefore, having a sample size of 2,000 or larger might guarantee relatively small differences between smoothing methods. 6. The selection of equating method had a much larger impact on the amount of equating error than did the selection of smoothing method. On average, chained equipercentile equating outperformed the frequency estimation or modified frequency estimation methods in terms of total error produced. 7. An argument could be made that the differences shown among the different smoothing methods might not be practically meaningful ones, since a majority of them were less than one-half a score point (see Figure 4.9). Nevertheless, if raw scores are converted to non-linearly transformed scale scores and rounding is applied for the purpose of score reporting, the differences could become significant ones (Cui, 2006). 8. Even though the differences are not strikingly large, different test conditions did have an influence on the relative performance of the various smoothing methods. The selection of the smoothing method should be made in relation to the specific testing condition under consideration, and by carefully examining both aggregate and conditional summary statistics.

116 104 APPENDIX A TABLES

117 105 Table A1. Factors Controlled in the Simulation. Factor Condition Sample size 300, 2,000, 6,000 Difference in population mean ability (old & new) (0, 1) & (.05, 1), (0, 1) & (.2, 1), (0, 1) & (.5, 1), (0, 1) & (-.2, 1) Proportion of common-items 20%, 40% Difference in form difficulty.05,.2 Type of common items Internal, External Common-item difficulty spread Mini, Midi

118 106 Table A2. Previous Comparison Studies Conducted Under the CINEG Design. Equating Methods Examined Relevant to This Study Smoothing Methods Examined Relevant to This Study Factors Examined Relevant to This Study Hanson (1991) Moses & Holland (2007) Antal et al. (2011) FE * * * CE * * KE with FE * KE with CE * Log-linear * * * presmoothing Cubic spline * postsmoothing Sample size * * * Group difference * * % common items * Form difference * Smoothing parameter * * Key Finding The log-linear The use of FE/CE versus presmoothing method KE did not exhibit produced the most largely different results accurate equating when the distributions functions under most of were strongly smoothed. the conditions investigated. Cubic spline postsmoothing performs better for CE, but log-linear presmoothing and cubic spline postsmoothing perform similarly for FE.

119 107 Table A3. Mean and Standard Deviation of Item Parameters. Form X (New) Common Items Form Y (Old) a b c a b c a b c Mini Internal CI20 bdiff (0.31) 0.05 (0.87) 0.26 (0.12) 0.84 (0.44) 0.00 (0.86) 0.30 (0.12) 0.86 (0.32) 0.00 (0.87) 0.28 (0.11) bdiff (0.32) 0.20 (0.86) 0.25 (0.12) 0.84 (0.44) 0.00 (0.86) 0.30 (0.12) 0.86 (0.32) 0.00 (0.87) 0.28 (0.11) CI40 bdiff (0.29) 0.05 (0.87) 0.25 (0.11) 0.78 (0.32) 0.00 (0.86) 0.27 (0.11) 0.83 (0.29) 0.00 (0.87) 0.28 (0.11) bdiff (0.26) 0.20 (0.86) 0.24 (0.11) 0.78 (0.33) 0.00 (0.86) 0.27 (0.11) 0.83 (0.29) 0.00 (0.87) 0.28 (0.11) External CI20 bdiff (0.22) 0.05 (0.84) 0.26 (0.11) 0.87 (0.39) 0.00 (0.84) 0.27 (0.11) 0.82 (0.25) 0.00 (0.84) 0.27 (0.12) bdiff (0.31) 0.20 (0.84) 0.27 (0.12) 0.87 (0.39) 0.00 (0.84) 0.27 (0.11) 0.87 (0.32) 0.00 (0.84) 0.28 (0.11) CI40 bdiff (0.27) 0.05 (0.89) 0.26 (0.11) 0.81 (0.34) 0.00 (0.89) 0.28 (0.09) 0.81 (0.26) 0.00 (0.90) 0.27 (0.11) bdiff (0.30) 0.20 (0.89) 0.26 (0.11) 0.81 (0.34) 0.00 (0.89) 0.28 (0.09) 0.81 (0.26) 0.00 (0.90) 0.27 (0.11) Midi Internal CI20 bdiff (0.28) 0.05 (0.87) 0.28 (0.10) 0.88 (0.33) (0.44) 0.30 (0.08) 0.86 (0.32) 0.00 (0.87) 0.28 (0.11) bdiff (0.33) 0.20 (0.87) 0.26 (0.11) 0.89 (0.36) 0.00 (0.43) 0.30 (0.07) 0.86 (0.32) 0.00 (0.87) 0.28 (0.11) CI40 bdiff (0.31) 0.05 (0.80) 0.26 (0.11) 0.81 (0.31) 0.00 (0.39) 0.24 (0.08) 0.82 (0.27) 0.00 (0.80) 0.25 (0.11) bdiff (0.28) 0.20 (0.87) 0.26 (0.10) 0.79 (0.30) 0.00 (0.43) 0.29 (0.10) 0.83 (0.29) 0.00 (0.87) 0.28 (0.11) External CI20 bdiff (0.22) 0.05 (0.84) 0.26 (0.11) 0.82 (0.35) 0.00 (0.42) 0.24 (0.10) 0.87 (0.32) 0.00 (0.84) 0.28 (0.11)

120 108 Table A3. Continued. CI40 bdiff.20 bdiff.05 bdiff (0.29) 0.81 (0.27) 0.81 (0.29) 0.20 (0.84) 0.05 (0.89) 0.20 (0.89) 0.27 (0.13) 0.26 (0.11) 0.25 (0.11) 0.82 (0.35) 0.80 (0.33) 0.80 (0.33) 0.00 (0.42) 0.00 (0.45) 0.00 (0.45) 0.24 (0.10) 0.27 (0.09) 0.27 (0.09) 0.87 (0.32) 0.81 (0.26) 0.81 (0.26) 0.00 (0.84) 0.00 (0.90) 0.00 (0.90) 0.28 (0.28) 0.27 (0.11) 0.27 (0.11) Note. The values on the first row of each cell are means, and the values in the parentheses are standard deviations; CI20=20% common items; CI40=40% common items; bdiff=form difference.

121 109 Table A4. Descriptive Statistics of Item Difficulty ( b ) Parameters. Form X (New) Common Items Form Y (Old) Mean SD Min Max Mean SD Min Max Mean SD Min Max Mini Midi Internal External Internal External CI20 CI40 CI20 CI40 CI20 CI40 CI20 CI40 bdiff bdiff bdiff bdiff bdiff bdiff bdiff bdiff bdiff bdiff bdiff bdiff bdiff bdiff bdiff bdiff Note. CI20=20% common items; CI40=40% common items; bdiff=form difference.

122 110 Table A5. Aggregated Weighted Absolute Bias for Different Sample Sizes. N 300 Frequency Estimation N 2,000 N 6,000 Unsmoothed (100) (100) (100) Post (101) (100) (100) Pre (98) (100) (100) KE (98) (100) (100) Modified Frequency Estimation Unsmoothed (100) (100) (100) Post (100) (98) (99) Pre (97) (99) (100) Chained Equipercentile Equating Unsmoothed (100) (100) (100) Post (101) (100) (100) Pre (101) (100) (100) KE (100) (99) (99) Note. Bold font indicates the smallest value within a specific equating method for each condition; shaded font indicates that the value was increased with smoothing; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

123 111 Table A6. Aggregated Weighted Standard Error for Different Sample Sizes. N 300 Frequency Estimation N 2,000 N 6,000 Unsmoothed (100) (100) (100) Post (55) (60) (63) Pre (62) (65) (65) KE (62) (65) (65) Modified Frequency Estimation Unsmoothed (100) (100) (100) Post (56) (61) (63) Pre (66) (67) (67) Chained Equipercentile Equating Unsmoothed (100) (100) (100) Post (53) (58) (61) Pre (61) (63) (63) KE (61) (62) (62) Note. Bold font indicates the smallest value within a specific equating method for each condition; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

124 112 Table A7. Aggregated Weighted Root Mean Square Error for Different Sample Sizes. N 300 Frequency Estimation N 2,000 N 6,000 Unsmoothed (100) (100) (100) Post (88) (97) (99) Pre (86) (96) (99) KE (86) (96) (99) Modified Frequency Estimation Unsmoothed (100) (100) (100) Post (82) (93) (96) Pre (82) (93) (98) Chained Equipercentile Equating Unsmoothed (100) (100) (100) Post (75) (91) (96) Pre (77) (90) (95) KE (76) (89) (94) Note. Bold font indicates the smallest value within a specific equating method for each condition; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

125 113 Table A8. Aggregated Weighted Absolute Bias for Different Group Differences. Group Diff =.05 Group Diff =.2 Frequency Estimation Group Diff =.5 Group Diff = -.2 Unsmoothed (100) (100) (100) (100) Post (101) (100) (100) (98) Pre (100) (99) (99) (99) KE (100) (99) (99) (99) Modified Frequency Estimation Unsmoothed (100) (100) (100) (100) Post (98) (98) (99) (101) Pre (100) (98) (98) (99) Chained Equipercentile Equating Unsmoothed (100) (100) (100) (100) Post (101) (101) (100) (100) Pre (100) (101) (100) (100) KE (100) (100) (98) (100) Note. Bold font indicates the smallest value within a specific equating method for each condition; shaded font indicates that the value was increased with smoothing; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

126 114 Table A9. Aggregated Weighted Standard Error for Different Group Differences. Group Diff =.05 Group Diff =.2 Frequency Estimation Group Diff =.5 Group Diff = -.2 Unsmoothed (100) (100) (100) (100) Post (56) (56) (58) (59) Pre (64) (63) (61) (65) KE (64) (62) (61) (65) Modified Frequency Estimation Unsmoothed (100) (100) (100) (100) Post (57) (57) (60) (59) Pre (67) (66) (64) (68) Chained Equipercentile Equating Unsmoothed (100) (100) (100) (100) Post (55) (53) (57) (55) Pre (63) (61) (62) (63) KE (62) (60) (61) (62) Note. Bold font indicates the smallest value within a specific equating method for each condition; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

127 115 Table A10. Aggregated Weighted Root Mean Square Error for Different Group Differences. Group Diff =.05 Group Diff =.2 Frequency Estimation Group Diff =.5 Group Diff = -.2 Unsmoothed (100) (100) (100) (100) Post (89) (94) (97) (91) Pre (88) (92) (95) (93) KE (88) (92) (95) (93) Modified Frequency Estimation Unsmoothed (100) (100) (100) (100) Post (86) (88) (92) (88) Pre (87) (89) (92) (90) Chained Equipercentile Equating Unsmoothed (100) (100) (100) (100) Post (86) (84) (88) (82) Pre (83) (84) (88) (86) KE (83) (83) (87) (85) Note. Bold font indicates the smallest value within a specific equating method for each condition; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

128 116 Table A11. Aggregated Weighted Absolute Bias for Different Proportions of Common Items. 20% Common Items 40% Common Items Frequency Estimation Unsmoothed (100) (100) Post (100) (101) Pre (100) (98) KE (100) (99) Modified Frequency Estimation Unsmoothed (100) (100) Post (98) (102) Pre (99) (98) Chained Equipercentile Equating Unsmoothed (100) (100) Post (100) (101) Pre (100) (101) KE (98) (101) Note. Bold font indicates the smallest value within a specific equating method for each condition; shaded font indicates that the value was increased with smoothing; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

129 117 Table A12. Aggregated Weighted Standard Error for Different Proportions of Common Items. 20% Common Items 40% Common Items Frequency Estimation Unsmoothed (100) (100) Post (58) (57) Pre (66) (60) KE (66) (60) Modified Frequency Estimation Unsmoothed (100) (100) Post (59) (58) Pre (70) (63) Chained Equipercentile Equating Unsmoothed (100) (100) Post (57) (54) Pre (67) (57) KE (65) (57) Note. Bold font indicates the smallest value within a specific equating method for each condition; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

130 118 Table A13. Aggregated Weighted Root Mean Square Error for Different Proportions of Common Items. 20% Common Items 40% Common Items Frequency Estimation Unsmoothed (100) (100) Post (95) (91) Pre (96) (88) KE (96) (88) Modified Frequency Estimation Unsmoothed (100) (100) Post (91) (85) Pre (94) (81) Chained Equipercentile Equating Unsmoothed (100) (100) Post (90) (78) Pre (91) (74) KE (90) (74) Note. Bold font indicates the smallest value within a specific equating method for each condition; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

131 119 Table A14. Aggregated Weighted Absolute Bias for Different Form Differences. Form Diff =.05 Form Diff =.2 Frequency Estimation Unsmoothed (100) (100) Post (100) (100) Pre (99) (99) KE (100) (101) Modified Frequency Estimation Unsmoothed (100) (100) Post (96) (101) Pre (99) (99) Chained Equipercentile Equating Unsmoothed (100) (100) Post (101) (100) Pre (101) (100) KE (98) (101) Note. Bold font indicates the smallest value within a specific equating method for each condition; shaded font indicates that the value was increased with smoothing; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

132 120 Table A15. Aggregated Weighted Standard Error for Different Form Differences. Form Diff =.05 Form Diff =.2 Frequency Estimation Unsmoothed (100) (100) Post (55) (60) Pre (62) (64) KE (62) (64) Modified Frequency Estimation Unsmoothed (100) (100) Post (55) (61) Pre (65) (67) Chained Equipercentile Equating Unsmoothed (100) (100) Post (53) (58) Pre (61) (63) KE (60) (63) Note. Bold font indicates the smallest value for each condition; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial loglinear presmoothing; Numbers after each method indicate the smoothing parameters used.

133 121 Table A16. Aggregated Weighted Root Mean Square Error for Different Form Differences. Form Diff =.05 Form Diff =.2 Frequency Estimation Unsmoothed (100) (100) Post (89) (96) Pre (90) (95) KE (90) (95) Modified Frequency Estimation Unsmoothed (100) (100) Post (79) (95) Pre (84) (93) Chained Equipercentile Equating Unsmoothed (100) (100) Post (75) (91) Pre (79) (90) KE (77) (89) Note. Bold font indicates the smallest value within a specific equating method for each condition; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

134 122 Table A17. Aggregated Weighted Absolute Bias for Different Common Item Types. Internal Common Items Frequency Estimation External Common Items Unsmoothed (100) (100) Post (100) (101) Pre (99) (99) KE (99) (99) Modified Frequency Estimation Unsmoothed (100) (100) Post (98) (103) Pre (99) (97) Chained Equipercentile Equating Unsmoothed (100) (100) Post (100) (102) Pre (100) (102) KE (99) (100) Note. Bold font indicates the smallest value within a specific equating method for each condition; shaded font indicates that the value was increased with smoothing; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

135 123 Table A18. Aggregated Weighted Standard Error for Different Common Item Types. Internal Common Items Frequency Estimation External Common Items Unsmoothed (100) (100) Post (64) (49) Pre (69) (56) KE (69) (56) Modified Frequency Estimation Unsmoothed (100) (100) Post (65) (51) Pre (73) (59) Chained Equipercentile Equating Unsmoothed (100) (100) Post (63) (46) Pre (67) (56) KE (66) (56) Note. Bold font indicates the smallest value within a specific equating method for each condition; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

136 124 Table A19. Aggregated Weighted Root Mean Square Error for Different Common Item Types. Internal Common Items Frequency Estimation External Common Items Unsmoothed (100) (100) Post (94) (93) Pre (94) (92) KE (94) (92) Modified Frequency Estimation Unsmoothed (100) (100) Post (90) (87) Pre (93) (82) Chained Equipercentile Equating Unsmoothed (100) (100) Post (89) (76) Pre (90) (76) KE (90) (74) Note. Bold font indicates the smallest value within a specific equating method for each condition; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

137 125 Table A20. Aggregated Weighted Absolute Bias for Different Spread of Difficulty Parameters in the Common Items. Mini Common Items Frequency Estimation Midi Common Items Unsmoothed (100) (100) Post (100) (100) Pre (99) (99) KE (99) (99) Modified Frequency Estimation Unsmoothed (100) (100) Post (101) (97) Pre (99) (99) Chained Equipercentile Equating Unsmoothed (100) (100) Post (100) (101) Pre (101) (100) KE (100) (98) Note. Bold font indicates the smallest value within a specific equating method for each condition; shaded font indicates that the value was increased with smoothing; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

138 126 Table A21. Aggregated Weighted Standard Error for Different Spread of Difficulty Parameters in the Common Items. Mini Common Items Frequency Estimation Midi Common Items Unsmoothed (100) (100) Post (56) (59) Pre (62) (64) KE (62) (64) Modified Frequency Estimation Unsmoothed (100) (100) Post (57) (59) Pre (66) (67) Chained Equipercentile Equating Unsmoothed (100) (100) Post (54) (57) Pre (62) (63) KE (61) (62) Note. Bold font indicates the smallest value within a specific equating method for each condition; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

139 127 Table A22. Aggregated Weighted Root Mean Square Error for Different Spread of Difficulty Parameters in the Common Items. Mini Common Items Frequency Estimation Midi Common Items Unsmoothed (100) (100) Post (93) (94) Pre (94) (92) KE (94) (93) Modified Frequency Estimation Unsmoothed (100) (100) Post (90) (88) Pre (90) (89) Chained Equipercentile Equating Unsmoothed (100) (100) Post (86) (85) Pre (87) (84) KE (86) (83) Note. Bold font indicates the smallest value within a specific equating method for each condition; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

140 128 Table A23. Aggregated Weighted Absolute Bias, Weighted Standard Error, and Weighted Root Mean Square Error for Different Smoothing Parameters. WAB WSE WRMSE Unsmoothed (100) (100) (100) Cubic Spline Postsmoothing Post (100) (78) (93) Post (102) (62) (90) Post (104) (57) (90) Polynomial Log-linear Presmoothing Pre (101) (64) (90) Pre (100) (64) (90) Pre (100) (71) (92) Pre (99) (71) (91) Kernel Equating KE (104) (63) (93) KE (104) (63) (93) KE (104) (71) (95) KE (104) (71) (94) Note. Bold font indicates the smallest value within a specific smoothing method for each index; shaded font indicates that the value was increased with smoothing; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing; Numbers after each method indicate the smoothing parameters used.

141 129 Table A24. Aggregated Weighted Absolute Bias, Weighted Standard Error, and Weighted Root Mean Square Error for Different Equating Methods. WAB Frequency Estimation WSE WRMSE Unsmoothed (100) (78) (93) Post (102) (62) (90) Pre (104) (57) (90) KE Modified Frequency Estimation Unsmoothed (101) (64) (90) Post (100) (64) (90) Pre (100) (71) (92) Chained Equipercentile Eqauting Unsmoothed (104) (63) (93) Post (104) (63) (93) Pre (104) (71) (95) KE (104) (71) (94) Note. Bold font indicates the smallest value within a specific equating method for each condition; shaded font indicates that the value was increased with smoothing; values in parentheses are the percentage of the statistic for smoothed divided by the statistic for unsmoothed method; KE=kernel equating, Post=cubic spline postsmoothing, Pre=polynomial log-linear presmoothing.

142 130 APPENDEX B FIGURES

143 131 Figure B1. Smoothing Methods Producing the Smallest WRMSE for Different Sample Sizes. Note. Red, Yellow, and Green cells indicate cubic spline postsmoothing, polynomial loglinear presmoothing, and kernel equating, respectively; The first 3 columns display results for the sample size of 300, the next 3 columns for 2,000, and the last 3 columns for 6,000; Within each sample size, results for 3 equating methods including frequency estimation, modified frequency estimation, and chained equipercentile equating are given.

144 132 Figure B2. Smoothing Methods Producing the Smallest WAB for Different Group Differences. Note. White, Red, Yellow, and Green cells indicate unsmoothed, cubic spline postsmoothing, polynomial log-linear presmoothing, and kernel equating, respectively; The first 3 columns display results for frequency estimation, the next 3 columns for modified frequency estimation, and the last 3 columns for chained equipercentile equating; Within each equating method, results for 3 sample sizes including 300, 2,000, and 6,000 are given.

145 133 Figure B3. Smoothing Methods Producing the Smallest WSE for Different Group Differences. Note. Red, Yellow, and Green cells indicate cubic spline postsmoothing, polynomial loglinear presmoothing, and kernel equating, respectively; The first 3 columns display results for frequency estimation, the next 3 columns for modified frequency estimation, and the last 3 columns for chained equipercentile equating; Within each equating method, results for 3 sample sizes including 300, 2,000, and 6,000 are given.

146 134 Figure B4. Smoothing Methods Producing the Smallest WRMSE for Different Group Differences. Note. Red, Yellow, and Green cells indicate cubic spline postsmoothing, polynomial loglinear presmoothing, and kernel equating, respectively; The first 3 columns display results for frequency estimation, the next 3 columns for modified frequency estimation, and the last 3 columns for chained equipercentile equating; Within each equating method, results for 3 sample sizes including 300, 2,000, and 6,000 are given.

147 135 Figure B5. Smoothing Methods Producing the Smallest WRMSE for Different Proportion of Common Items. Note. Red, Yellow, and Green cells indicate cubic spline postsmoothing, polynomial loglinear presmoothing, and kernel equating, respectively; The first 3 columns display results for frequency estimation, the next 3 columns for modified frequency estimation, and the last 3 columns for chained equipercentile equating; Within each equating method, results for 3 sample sizes including 300, 2,000, and 6,000 are given.

148 136 Figure B6. Smoothing Methods Producing the Smallest WRMSE for Different Form Differences. Note. Red, Yellow, and Green cells indicate cubic spline postsmoothing, polynomial loglinear presmoothing, and kernel equating, respectively; The first 3 columns display results for frequency estimation, the next 3 columns for modified frequency estimation, and the last 3 columns for chained equipercentile equating; Within each equating method, results for 3 sample sizes including 300, 2,000, and 6,000 are given.

149 137 Figure B7. Smoothing Methods Producing the Smallest WRMSE for Different Common Item Types. Note. Red, Yellow, and Green cells indicate cubic spline postsmoothing, polynomial loglinear presmoothing, and kernel equating, respectively; The first 3 columns display results for frequency estimation, the next 3 columns for modified frequency estimation, and the last 3 columns for chained equipercentile equating; Within each equating method, results for 3 sample sizes including 300, 2,000, and 6,000 are given.

150 138 Figure B8. Smoothing Methods Producing the Smallest WRMSE for Different Spread of Difficulty Parameters in the Common-Item Sets. Note. Red, Yellow, and Green cells indicate cubic spline postsmoothing, polynomial loglinear presmoothing, and kernel equating, respectively; The first 3 columns display results for frequency estimation, the next 3 columns for modified frequency estimation, and the last 3 columns for chained equipercentile equating; Within each equating method, results for 3 sample sizes including 300, 2,000, and 6,000 are given.

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 52 A Statistical Criterion to Assess Fitness of Cubic-Spline Postsmoothing Hyung Jin Kim Robert L. Brennan Won-Chan

More information

Smoothing and Equating Methods Applied to Different Types of Test Score Distributions and Evaluated With Respect to Multiple Equating Criteria

Smoothing and Equating Methods Applied to Different Types of Test Score Distributions and Evaluated With Respect to Multiple Equating Criteria Research Report ETS RR 11-20 Smoothing and Equating Methods Applied to Different Types of Test Score Distributions and Evaluated With Respect to Multiple Equating Criteria Tim Moses Jinghua Liu April 2011

More information

An Investigation of Using a Polynomial Loglinear Model to Evaluate Differences in Test Score Distributions. Diane Talley

An Investigation of Using a Polynomial Loglinear Model to Evaluate Differences in Test Score Distributions. Diane Talley An Investigation of Using a Polynomial Loglinear Model to Evaluate Differences in Test Score Distributions Diane Talley A thesis submitted to the faculty of the University of North Carolina at Chapel Hill

More information

Improving the Post-Smoothing of Test Norms with Kernel Smoothing

Improving the Post-Smoothing of Test Norms with Kernel Smoothing Improving the Post-Smoothing of Test Norms with Kernel Smoothing Anli Lin Qing Yi Michael J. Young Pearson Paper presented at the Annual Meeting of National Council on Measurement in Education, May 1-3,

More information

Examining the Impact of Drifted Polytomous Anchor Items on Test Characteristic Curve (TCC) Linking and IRT True Score Equating

Examining the Impact of Drifted Polytomous Anchor Items on Test Characteristic Curve (TCC) Linking and IRT True Score Equating Research Report ETS RR 12-09 Examining the Impact of Drifted Polytomous Anchor Items on Test Characteristic Curve (TCC) Linking and IRT True Score Equating Yanmei Li May 2012 Examining the Impact of Drifted

More information

A comparison study of IRT calibration methods for mixed-format tests in vertical scaling

A comparison study of IRT calibration methods for mixed-format tests in vertical scaling University of Iowa Iowa Research Online Theses and Dissertations 7 A comparison study of IRT calibration methods for mixed-format tests in vertical scaling Huijuan Meng University of Iowa Copyright 7 Huijuan

More information

Does Pre-equating Work? An Investigation into Pre-equated Testlet-based College Placement Exam Using Post. Administration Data.

Does Pre-equating Work? An Investigation into Pre-equated Testlet-based College Placement Exam Using Post. Administration Data. Does Pre-equating Work? An Investigation into Pre-equated Testlet-based College Placement Exam Using Post Administration Data Wei He Michigan State University Rui Gao Chunyi Ruan ETS, Princeton, NJ Paper

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Manual for LEGS Version 2.0. Robert L. Brennan

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report. Manual for LEGS Version 2.0. Robert L. Brennan Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 3 Manual for LEGS Version 20 Robert L Brennan January, 2004 LEGS Version 20 Disclaimer of Warranty No warranties are

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Monograph. Number 1. Equating Recipes

Center for Advanced Studies in Measurement and Assessment. CASMA Monograph. Number 1. Equating Recipes Center for Advanced Studies in Measurement and Assessment CASMA Monograph Number 1 Equating Recipes Robert L. Brennan Tianyou Wang Seonghoon Kim Jaehoon Seol With Contributions by Michael J. Kolen Version

More information

Enhanced Web Log Based Recommendation by Personalized Retrieval

Enhanced Web Log Based Recommendation by Personalized Retrieval Enhanced Web Log Based Recommendation by Personalized Retrieval Xueping Peng FACULTY OF ENGINEERING AND INFORMATION TECHNOLOGY UNIVERSITY OF TECHNOLOGY, SYDNEY A thesis submitted for the degree of Doctor

More information

Small Sample Equating: Best Practices using a SAS Macro

Small Sample Equating: Best Practices using a SAS Macro SESUG 2013 Paper BtB - 11 Small Sample Equating: Best Practices using a SAS Macro Anna M. Kurtz, North Carolina State University, Raleigh, NC Andrew C. Dwyer, Castle Worldwide, Inc., Morrisville, NC ABSTRACT

More information

Detecting Polytomous Items That Have Drifted: Using Global Versus Step Difficulty 1,2. Xi Wang and Ronald K. Hambleton

Detecting Polytomous Items That Have Drifted: Using Global Versus Step Difficulty 1,2. Xi Wang and Ronald K. Hambleton Detecting Polytomous Items That Have Drifted: Using Global Versus Step Difficulty 1,2 Xi Wang and Ronald K. Hambleton University of Massachusetts Amherst Introduction When test forms are administered to

More information

Constrained Control Allocation for. Systems with Redundant Control Effectors. Kenneth A. Bordignon. Dissertation submitted to the Faculty of the

Constrained Control Allocation for. Systems with Redundant Control Effectors. Kenneth A. Bordignon. Dissertation submitted to the Faculty of the Constrained Control Allocation for Systems with Redundant Control Effectors by Kenneth A. Bordignon Dissertation submitted to the Faculty of the Virginia Polytechnic Institute and State University in partial

More information

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report Center for Advanced Studies in Measurement and Assessment CASMA Research Report Number 45 Manual for ESUM-RG: Excel VBA Macros for Comparing Multiple Equating Procedures under the Random Groups Design

More information

I How does the formulation (5) serve the purpose of the composite parameterization

I How does the formulation (5) serve the purpose of the composite parameterization Supplemental Material to Identifying Alzheimer s Disease-Related Brain Regions from Multi-Modality Neuroimaging Data using Sparse Composite Linear Discrimination Analysis I How does the formulation (5)

More information

Nonparametric Estimation of Distribution Function using Bezier Curve

Nonparametric Estimation of Distribution Function using Bezier Curve Communications for Statistical Applications and Methods 2014, Vol. 21, No. 1, 105 114 DOI: http://dx.doi.org/10.5351/csam.2014.21.1.105 ISSN 2287-7843 Nonparametric Estimation of Distribution Function

More information

Downloaded from

Downloaded from UNIT 2 WHAT IS STATISTICS? Researchers deal with a large amount of data and have to draw dependable conclusions on the basis of data collected for the purpose. Statistics help the researchers in making

More information

Frequency Distributions

Frequency Distributions Displaying Data Frequency Distributions After collecting data, the first task for a researcher is to organize and summarize the data so that it is possible to get a general overview of the results. Remember,

More information

EVALUATION OF THE NORMAL APPROXIMATION FOR THE PAIRED TWO SAMPLE PROBLEM WITH MISSING DATA. Shang-Lin Yang. B.S., National Taiwan University, 1996

EVALUATION OF THE NORMAL APPROXIMATION FOR THE PAIRED TWO SAMPLE PROBLEM WITH MISSING DATA. Shang-Lin Yang. B.S., National Taiwan University, 1996 EVALUATION OF THE NORMAL APPROXIMATION FOR THE PAIRED TWO SAMPLE PROBLEM WITH MISSING DATA By Shang-Lin Yang B.S., National Taiwan University, 1996 M.S., University of Pittsburgh, 2005 Submitted to the

More information

Equating. Lecture #10 ICPSR Item Response Theory Workshop

Equating. Lecture #10 ICPSR Item Response Theory Workshop Equating Lecture #10 ICPSR Item Response Theory Workshop Lecture #10: 1of 81 Lecture Overview Test Score Equating Using IRT How do we get the results from separate calibrations onto the same scale, so

More information

Can AI learn to equate?

Can AI learn to equate? Can AI learn to equate? Conference Paper Tom Benton Presented at the International Meeting of the Psychometric Society, Zurich, Switzerland, July 2017 Author contact details: Tom Benton Assessment Research

More information

How Is the CPA Exam Scored? Prepared by the American Institute of Certified Public Accountants

How Is the CPA Exam Scored? Prepared by the American Institute of Certified Public Accountants How Is the CPA Exam Scored? Prepared by the American Institute of Certified Public Accountants Questions pertaining to this decision paper should be directed to Carie Chester, Office Administrator, Exams

More information

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google,

In the recent past, the World Wide Web has been witnessing an. explosive growth. All the leading web search engines, namely, Google, 1 1.1 Introduction In the recent past, the World Wide Web has been witnessing an explosive growth. All the leading web search engines, namely, Google, Yahoo, Askjeeves, etc. are vying with each other to

More information

Random projection for non-gaussian mixture models

Random projection for non-gaussian mixture models Random projection for non-gaussian mixture models Győző Gidófalvi Department of Computer Science and Engineering University of California, San Diego La Jolla, CA 92037 gyozo@cs.ucsd.edu Abstract Recently,

More information

PERFORMANCE OF THE DISTRIBUTED KLT AND ITS APPROXIMATE IMPLEMENTATION

PERFORMANCE OF THE DISTRIBUTED KLT AND ITS APPROXIMATE IMPLEMENTATION 20th European Signal Processing Conference EUSIPCO 2012) Bucharest, Romania, August 27-31, 2012 PERFORMANCE OF THE DISTRIBUTED KLT AND ITS APPROXIMATE IMPLEMENTATION Mauricio Lara 1 and Bernard Mulgrew

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression John Fox Department of Sociology McMaster University 1280 Main Street West Hamilton, Ontario Canada L8S 4M4 jfox@mcmaster.ca February 2004 Abstract Nonparametric regression analysis

More information

The Structure and Properties of Clique Graphs of Regular Graphs

The Structure and Properties of Clique Graphs of Regular Graphs The University of Southern Mississippi The Aquila Digital Community Master's Theses 1-014 The Structure and Properties of Clique Graphs of Regular Graphs Jan Burmeister University of Southern Mississippi

More information

Computer Vision Based Analysis of Broccoli for Application in a Selective Autonomous Harvester

Computer Vision Based Analysis of Broccoli for Application in a Selective Autonomous Harvester Computer Vision Based Analysis of Broccoli for Application in a Selective Autonomous Harvester Rachael Angela Ramirez Abstract As technology advances in all areas of society and industry, the technology

More information

Modelling and Quantitative Methods in Fisheries

Modelling and Quantitative Methods in Fisheries SUB Hamburg A/553843 Modelling and Quantitative Methods in Fisheries Second Edition Malcolm Haddon ( r oc) CRC Press \ y* J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of

More information

An Adaptive Algorithm for Range Queries in Differential Privacy

An Adaptive Algorithm for Range Queries in Differential Privacy Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 6-2016 An Adaptive Algorithm for Range Queries in Differential Privacy Asma Alnemari Follow this and additional

More information

You ve already read basics of simulation now I will be taking up method of simulation, that is Random Number Generation

You ve already read basics of simulation now I will be taking up method of simulation, that is Random Number Generation Unit 5 SIMULATION THEORY Lesson 39 Learning objective: To learn random number generation. Methods of simulation. Monte Carlo method of simulation You ve already read basics of simulation now I will be

More information

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things.

Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. + What is Data? Data is a collection of facts. Data can be in the form of numbers, words, measurements, observations or even just descriptions of things. In most cases, data needs to be interpreted and

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 3: Distributions Regression III: Advanced Methods William G. Jacoby Michigan State University Goals of the lecture Examine data in graphical form Graphs for looking at univariate distributions

More information

Student retention in distance education using on-line communication.

Student retention in distance education using on-line communication. Doctor of Philosophy (Education) Student retention in distance education using on-line communication. Kylie Twyford AAPI BBus BEd (Hons) 2007 Certificate of Originality I certify that the work in this

More information

System Identification Algorithms and Techniques for Systems Biology

System Identification Algorithms and Techniques for Systems Biology System Identification Algorithms and Techniques for Systems Biology by c Choujun Zhan A Thesis submitted to the School of Graduate Studies in partial fulfillment of the requirements for the degree of Doctor

More information

COPULA MODELS FOR BIG DATA USING DATA SHUFFLING

COPULA MODELS FOR BIG DATA USING DATA SHUFFLING COPULA MODELS FOR BIG DATA USING DATA SHUFFLING Krish Muralidhar, Rathindra Sarathy Department of Marketing & Supply Chain Management, Price College of Business, University of Oklahoma, Norman OK 73019

More information

Chapter 17: INTERNATIONAL DATA PRODUCTS

Chapter 17: INTERNATIONAL DATA PRODUCTS Chapter 17: INTERNATIONAL DATA PRODUCTS After the data processing and data analysis, a series of data products were delivered to the OECD. These included public use data files and codebooks, compendia

More information

Section 4 Matching Estimator

Section 4 Matching Estimator Section 4 Matching Estimator Matching Estimators Key Idea: The matching method compares the outcomes of program participants with those of matched nonparticipants, where matches are chosen on the basis

More information

Removing Subjectivity from the Assessment of Critical Process Parameters and Their Impact

Removing Subjectivity from the Assessment of Critical Process Parameters and Their Impact Peer-Reviewed Removing Subjectivity from the Assessment of Critical Process Parameters and Their Impact Fasheng Li, Brad Evans, Fangfang Liu, Jingnan Zhang, Ke Wang, and Aili Cheng D etermining critical

More information

Building Better Parametric Cost Models

Building Better Parametric Cost Models Building Better Parametric Cost Models Based on the PMI PMBOK Guide Fourth Edition 37 IPDI has been reviewed and approved as a provider of project management training by the Project Management Institute

More information

Modeling Plant Succession with Markov Matrices

Modeling Plant Succession with Markov Matrices Modeling Plant Succession with Markov Matrices 1 Modeling Plant Succession with Markov Matrices Concluding Paper Undergraduate Biology and Math Training Program New Jersey Institute of Technology Catherine

More information

Averages and Variation

Averages and Variation Averages and Variation 3 Copyright Cengage Learning. All rights reserved. 3.1-1 Section 3.1 Measures of Central Tendency: Mode, Median, and Mean Copyright Cengage Learning. All rights reserved. 3.1-2 Focus

More information

Generalized Additive Model

Generalized Additive Model Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1

More information

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display CURRICULUM MAP TEMPLATE Priority Standards = Approximately 70% Supporting Standards = Approximately 20% Additional Standards = Approximately 10% HONORS PROBABILITY AND STATISTICS Essential Questions &

More information

SAS/STAT 13.1 User s Guide. The NESTED Procedure

SAS/STAT 13.1 User s Guide. The NESTED Procedure SAS/STAT 13.1 User s Guide The NESTED Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute

More information

2.4. A LIBRARY OF PARENT FUNCTIONS

2.4. A LIBRARY OF PARENT FUNCTIONS 2.4. A LIBRARY OF PARENT FUNCTIONS 1 What You Should Learn Identify and graph linear and squaring functions. Identify and graph cubic, square root, and reciprocal function. Identify and graph step and

More information

lecture 10: B-Splines

lecture 10: B-Splines 9 lecture : -Splines -Splines: a basis for splines Throughout our discussion of standard polynomial interpolation, we viewed P n as a linear space of dimension n +, and then expressed the unique interpolating

More information

USING PRINCIPAL COMPONENTS ANALYSIS FOR AGGREGATING JUDGMENTS IN THE ANALYTIC HIERARCHY PROCESS

USING PRINCIPAL COMPONENTS ANALYSIS FOR AGGREGATING JUDGMENTS IN THE ANALYTIC HIERARCHY PROCESS Analytic Hierarchy To Be Submitted to the the Analytic Hierarchy 2014, Washington D.C., U.S.A. USING PRINCIPAL COMPONENTS ANALYSIS FOR AGGREGATING JUDGMENTS IN THE ANALYTIC HIERARCHY PROCESS Natalie M.

More information

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018

Performance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:

More information

Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation

Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation Fast Automated Estimation of Variance in Discrete Quantitative Stochastic Simulation November 2010 Nelson Shaw njd50@uclive.ac.nz Department of Computer Science and Software Engineering University of Canterbury,

More information

Content distribution networks over shared infrastructure : a paradigm for future content network deployment

Content distribution networks over shared infrastructure : a paradigm for future content network deployment University of Wollongong Research Online University of Wollongong Thesis Collection 1954-2016 University of Wollongong Thesis Collections 2005 Content distribution networks over shared infrastructure :

More information

Image resizing and image quality

Image resizing and image quality Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 2001 Image resizing and image quality Michael Godlewski Follow this and additional works at: http://scholarworks.rit.edu/theses

More information

WELCOME! Lecture 3 Thommy Perlinger

WELCOME! Lecture 3 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important

More information

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL

ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL ADAPTIVE TILE CODING METHODS FOR THE GENERALIZATION OF VALUE FUNCTIONS IN THE RL STATE SPACE A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY BHARAT SIGINAM IN

More information

1. Mesh Coloring a.) Assign unique color to each polygon based on the polygon id.

1. Mesh Coloring a.) Assign unique color to each polygon based on the polygon id. 1. Mesh Coloring a.) Assign unique color to each polygon based on the polygon id. Figure 1: The dragon model is shown rendered using a coloring scheme based on coloring each triangle face according to

More information

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea Chapter 3 Bootstrap 3.1 Introduction The estimation of parameters in probability distributions is a basic problem in statistics that one tends to encounter already during the very first course on the subject.

More information

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. EE613 Machine Learning for Engineers LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. 4, 2015 1 Outline Multivariate ordinary least squares Singular value

More information

Probability Models.S4 Simulating Random Variables

Probability Models.S4 Simulating Random Variables Operations Research Models and Methods Paul A. Jensen and Jonathan F. Bard Probability Models.S4 Simulating Random Variables In the fashion of the last several sections, we will often create probability

More information

Chapter 2 Basic Structure of High-Dimensional Spaces

Chapter 2 Basic Structure of High-Dimensional Spaces Chapter 2 Basic Structure of High-Dimensional Spaces Data is naturally represented geometrically by associating each record with a point in the space spanned by the attributes. This idea, although simple,

More information

On Soft Topological Linear Spaces

On Soft Topological Linear Spaces Republic of Iraq Ministry of Higher Education and Scientific Research University of AL-Qadisiyah College of Computer Science and Formation Technology Department of Mathematics On Soft Topological Linear

More information

Equivalence Tests for Two Means in a 2x2 Cross-Over Design using Differences

Equivalence Tests for Two Means in a 2x2 Cross-Over Design using Differences Chapter 520 Equivalence Tests for Two Means in a 2x2 Cross-Over Design using Differences Introduction This procedure calculates power and sample size of statistical tests of equivalence of the means of

More information

Distributions of Continuous Data

Distributions of Continuous Data C H A P T ER Distributions of Continuous Data New cars and trucks sold in the United States average about 28 highway miles per gallon (mpg) in 2010, up from about 24 mpg in 2004. Some of the improvement

More information

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov.

EE613 Machine Learning for Engineers LINEAR REGRESSION. Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. EE613 Machine Learning for Engineers LINEAR REGRESSION Sylvain Calinon Robot Learning & Interaction Group Idiap Research Institute Nov. 9, 2017 1 Outline Multivariate ordinary least squares Matlab code:

More information

This lesson is designed to improve students

This lesson is designed to improve students NATIONAL MATH + SCIENCE INITIATIVE Mathematics g x 8 6 4 2 0 8 6 4 2 y h x k x f x r x 8 6 4 2 0 8 6 4 2 2 2 4 6 8 0 2 4 6 8 4 6 8 0 2 4 6 8 LEVEL Algebra or Math in a unit on function transformations

More information

Lecture 8. Divided Differences,Least-Squares Approximations. Ceng375 Numerical Computations at December 9, 2010

Lecture 8. Divided Differences,Least-Squares Approximations. Ceng375 Numerical Computations at December 9, 2010 Lecture 8, Ceng375 Numerical Computations at December 9, 2010 Computer Engineering Department Çankaya University 8.1 Contents 1 2 3 8.2 : These provide a more efficient way to construct an interpolating

More information

THE TITLE OF MY THESIS GOES HERE: USE ALL CAPS AND PUT THE SUBTITLE ON THE SECOND LINE

THE TITLE OF MY THESIS GOES HERE: USE ALL CAPS AND PUT THE SUBTITLE ON THE SECOND LINE THE TITLE OF MY THESIS GOES HERE: USE ALL CAPS AND PUT THE SUBTITLE ON THE SECOND LINE Student Name (as it appears in One.IU) Submitted to the faculty of the School of Informatics in partial fulfillment

More information

Bayesian Classification Using Probabilistic Graphical Models

Bayesian Classification Using Probabilistic Graphical Models San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 2014 Bayesian Classification Using Probabilistic Graphical Models Mehal Patel San Jose State University

More information

Predict Outcomes and Reveal Relationships in Categorical Data

Predict Outcomes and Reveal Relationships in Categorical Data PASW Categories 18 Specifications Predict Outcomes and Reveal Relationships in Categorical Data Unleash the full potential of your data through predictive analysis, statistical learning, perceptual mapping,

More information

Mathematics Scope & Sequence Grade 8 Revised: June 2015

Mathematics Scope & Sequence Grade 8 Revised: June 2015 Mathematics Scope & Sequence 2015-16 Grade 8 Revised: June 2015 Readiness Standard(s) First Six Weeks (29 ) 8.2D Order a set of real numbers arising from mathematical and real-world contexts Convert between

More information

Particle Swarm Optimization Methods for Pattern. Recognition and Image Processing

Particle Swarm Optimization Methods for Pattern. Recognition and Image Processing Particle Swarm Optimization Methods for Pattern Recognition and Image Processing by Mahamed G. H. Omran Submitted in partial fulfillment of the requirements for the degree Philosophiae Doctor in the Faculty

More information

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Generalized Additive Model and Applications in Direct Marketing Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing Abstract Logistic regression 1 has been widely used in direct marketing applications

More information

ACCURACY AND EFFICIENCY OF MONTE CARLO METHOD. Julius Goodman. Bechtel Power Corporation E. Imperial Hwy. Norwalk, CA 90650, U.S.A.

ACCURACY AND EFFICIENCY OF MONTE CARLO METHOD. Julius Goodman. Bechtel Power Corporation E. Imperial Hwy. Norwalk, CA 90650, U.S.A. - 430 - ACCURACY AND EFFICIENCY OF MONTE CARLO METHOD Julius Goodman Bechtel Power Corporation 12400 E. Imperial Hwy. Norwalk, CA 90650, U.S.A. ABSTRACT The accuracy of Monte Carlo method of simulating

More information

Splines and penalized regression

Splines and penalized regression Splines and penalized regression November 23 Introduction We are discussing ways to estimate the regression function f, where E(y x) = f(x) One approach is of course to assume that f has a certain shape,

More information

Edge Equalized Treemaps

Edge Equalized Treemaps Edge Equalized Treemaps Aimi Kobayashi Department of Computer Science University of Tsukuba Ibaraki, Japan kobayashi@iplab.cs.tsukuba.ac.jp Kazuo Misue Faculty of Engineering, Information and Systems University

More information

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM 4.1 Introduction Nowadays money investment in stock market gains major attention because of its dynamic nature. So the

More information

HPISD Eighth Grade Math

HPISD Eighth Grade Math HPISD Eighth Grade Math The student uses mathematical processes to: acquire and demonstrate mathematical understanding Apply mathematics to problems arising in everyday life, society, and the workplace.

More information

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,

More information

SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITY

SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE NORMALITY Virginia Commonwealth University VCU Scholars Compass Theses and Dissertations Graduate School 6 SENSITIVITY ANALYSIS IN HANDLING DISCRETE DATA MISSING AT RANDOM IN HIERARCHICAL LINEAR MODELS VIA MULTIVARIATE

More information

Reliability Measure of 2D-PAGE Spot Matching using Multiple Graphs

Reliability Measure of 2D-PAGE Spot Matching using Multiple Graphs Reliability Measure of 2D-PAGE Spot Matching using Multiple Graphs Dae-Seong Jeoune 1, Chan-Myeong Han 2, Yun-Kyoo Ryoo 3, Sung-Woo Han 4, Hwi-Won Kim 5, Wookhyun Kim 6, and Young-Woo Yoon 6 1 Department

More information

Lecture 9: Introduction to Spline Curves

Lecture 9: Introduction to Spline Curves Lecture 9: Introduction to Spline Curves Splines are used in graphics to represent smooth curves and surfaces. They use a small set of control points (knots) and a function that generates a curve through

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

ALGORITHMIC ASPECTS OF DOMINATION AND ITS VARIATIONS ARTI PANDEY

ALGORITHMIC ASPECTS OF DOMINATION AND ITS VARIATIONS ARTI PANDEY ALGORITHMIC ASPECTS OF DOMINATION AND ITS VARIATIONS ARTI PANDEY DEPARTMENT OF MATHEMATICS INDIAN INSTITUTE OF TECHNOLOGY DELHI JUNE 2016 c Indian Institute of Technology Delhi (IITD), New Delhi, 2016.

More information

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types

More information

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.

Prepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order. Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good

More information

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series

More information

Test designs for evaluating the effectiveness of mail packs Received: 30th November, 2001

Test designs for evaluating the effectiveness of mail packs Received: 30th November, 2001 Test designs for evaluating the effectiveness of mail packs Received: 30th November, 2001 Leonard Paas previously worked as a senior consultant at the Database Marketing Centre of Postbank. He worked on

More information

3. Data Structures for Image Analysis L AK S H M O U. E D U

3. Data Structures for Image Analysis L AK S H M O U. E D U 3. Data Structures for Image Analysis L AK S H M AN @ O U. E D U Different formulations Can be advantageous to treat a spatial grid as a: Levelset Matrix Markov chain Topographic map Relational structure

More information

Today s Topics. Percentile ranks and percentiles. Standardized scores. Using standardized scores to estimate percentiles

Today s Topics. Percentile ranks and percentiles. Standardized scores. Using standardized scores to estimate percentiles Today s Topics Percentile ranks and percentiles Standardized scores Using standardized scores to estimate percentiles Using µ and σ x to learn about percentiles Percentiles, standardized scores, and the

More information

Instantaneously trained neural networks with complex inputs

Instantaneously trained neural networks with complex inputs Louisiana State University LSU Digital Commons LSU Master's Theses Graduate School 2003 Instantaneously trained neural networks with complex inputs Pritam Rajagopal Louisiana State University and Agricultural

More information

CHAPTER 4 MAINTENANCE STRATEGY SELECTION USING TOPSIS AND FUZZY TOPSIS

CHAPTER 4 MAINTENANCE STRATEGY SELECTION USING TOPSIS AND FUZZY TOPSIS 59 CHAPTER 4 MAINTENANCE STRATEGY SELECTION USING TOPSIS AND FUZZY TOPSIS 4.1 INTRODUCTION The development of FAHP-TOPSIS and fuzzy TOPSIS for selection of maintenance strategy is elaborated in this chapter.

More information

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University Exploratory data analysis tasks Examine the data, in search of structures

More information

number Understand the equivalence between recurring decimals and fractions

number Understand the equivalence between recurring decimals and fractions number Understand the equivalence between recurring decimals and fractions Using and Applying Algebra Calculating Shape, Space and Measure Handling Data Use fractions or percentages to solve problems involving

More information

Course of study- Algebra Introduction: Algebra 1-2 is a course offered in the Mathematics Department. The course will be primarily taken by

Course of study- Algebra Introduction: Algebra 1-2 is a course offered in the Mathematics Department. The course will be primarily taken by Course of study- Algebra 1-2 1. Introduction: Algebra 1-2 is a course offered in the Mathematics Department. The course will be primarily taken by students in Grades 9 and 10, but since all students must

More information

Chapter 6 Normal Probability Distributions

Chapter 6 Normal Probability Distributions Chapter 6 Normal Probability Distributions 6-1 Review and Preview 6-2 The Standard Normal Distribution 6-3 Applications of Normal Distributions 6-4 Sampling Distributions and Estimators 6-5 The Central

More information

High Ppeed Circuit Techniques for Network Intrusion Detection Systems (NIDS)

High Ppeed Circuit Techniques for Network Intrusion Detection Systems (NIDS) The University of Akron IdeaExchange@UAkron Mechanical Engineering Faculty Research Mechanical Engineering Department 2008 High Ppeed Circuit Techniques for Network Intrusion Detection Systems (NIDS) Ajay

More information

UvA-DARE (Digital Academic Repository) Memory-type control charts in statistical process control Abbas, N. Link to publication

UvA-DARE (Digital Academic Repository) Memory-type control charts in statistical process control Abbas, N. Link to publication UvA-DARE (Digital Academic Repository) Memory-type control charts in statistical process control Abbas, N. Link to publication Citation for published version (APA): Abbas, N. (2012). Memory-type control

More information

CPSC 532L Project Development and Axiomatization of a Ranking System

CPSC 532L Project Development and Axiomatization of a Ranking System CPSC 532L Project Development and Axiomatization of a Ranking System Catherine Gamroth cgamroth@cs.ubc.ca Hammad Ali hammada@cs.ubc.ca April 22, 2009 Abstract Ranking systems are central to many internet

More information

Direct Variations DIRECT AND INVERSE VARIATIONS 19. Name

Direct Variations DIRECT AND INVERSE VARIATIONS 19. Name DIRECT AND INVERSE VARIATIONS 19 Direct Variations Name Of the many relationships that two variables can have, one category is called a direct variation. Use the description and example of direct variation

More information

1. Assumptions. 1. Introduction. 2. Terminology

1. Assumptions. 1. Introduction. 2. Terminology 4. Process Modeling 4. Process Modeling The goal for this chapter is to present the background and specific analysis techniques needed to construct a statistical model that describes a particular scientific

More information

An Ontological Framework for Contextualising Information in Hypermedia Systems.

An Ontological Framework for Contextualising Information in Hypermedia Systems. An Ontological Framework for Contextualising Information in Hypermedia Systems. by Andrew James Bucknell Thesis submitted for the degree of Doctor of Philosophy University of Technology, Sydney 2008 CERTIFICATE

More information