Improving the Post-Smoothing of Test Norms with Kernel Smoothing

Similar documents
Smoothing and Equating Methods Applied to Different Types of Test Score Distributions and Evaluated With Respect to Multiple Equating Criteria

CREATING THE DISTRIBUTION ANALYSIS

AP Statistics. Study Guide

Generalized Additive Model

Name: Date: Period: Chapter 2. Section 1: Describing Location in a Distribution

On Kernel Density Estimation with Univariate Application. SILOKO, Israel Uzuazor

CALCULATION OF OPERATIONAL LOSSES WITH NON- PARAMETRIC APPROACH: DERAILMENT LOSSES

Center for Advanced Studies in Measurement and Assessment. CASMA Research Report

Lecture 3 Questions that we should be able to answer by the end of this lecture:

JMASM35: A Percentile-Based Power Method: Simulating Multivariate Non-normal Continuous Distributions (SAS)

Regression III: Advanced Methods

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

STAT:5400 Computing in Statistics

Chapter 5: The standard deviation as a ruler and the normal model p131

Lecture 3 Questions that we should be able to answer by the end of this lecture:

Detecting Polytomous Items That Have Drifted: Using Global Versus Step Difficulty 1,2. Xi Wang and Ronald K. Hambleton

Frequency Distributions

Density Curve (p52) Density curve is a curve that - is always on or above the horizontal axis.

6-1 THE STANDARD NORMAL DISTRIBUTION

Nonparametric Estimation of Distribution Function using Bezier Curve

NATIONAL BOARD OF MEDICAL EXAMINERS Subject Examination Program. Introduction to Clinical Diagnosis Examination Paper Version

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

Nonparametric Density Estimation

Chapter 3. Bootstrap. 3.1 Introduction. 3.2 The general idea

1. To condense data in a single value. 2. To facilitate comparisons between data.

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

Averages and Variation

Probability Models.S4 Simulating Random Variables

Learning Objectives. Continuous Random Variables & The Normal Probability Distribution. Continuous Random Variable

MHPE 494: Data Analysis. Welcome! The Analytic Process

STAT 503 Fall Introduction to SAS

Fitting Fragility Functions to Structural Analysis Data Using Maximum Likelihood Estimation

Key: 5 9 represents a team with 59 wins. (c) The Kansas City Royals and Cleveland Indians, who both won 65 games.

A comparison of smoothing methods for the common item nonequivalent groups design

STA Module 4 The Normal Distribution

STA /25/12. Module 4 The Normal Distribution. Learning Objectives. Let s Look at Some Examples of Normal Curves

Example 1. Find the x value that has a left tail area of.1131 P ( x <??? ) =. 1131

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Enterprise Miner Tutorial Notes 2 1

COPULA MODELS FOR BIG DATA USING DATA SHUFFLING

Normal Distribution. 6.4 Applications of Normal Distribution

Nonparametric regression using kernel and spline methods

Data Mining Chapter 3: Visualizing and Exploring Data Fall 2011 Ming Li Department of Computer Science and Technology Nanjing University

Using the DATAMINE Program

Getting to Know Your Data

appstats6.notebook September 27, 2016

Multivariate Analysis

BIOL Gradation of a histogram (a) into the normal curve (b)

Statistics Lecture 6. Looking at data one variable

Ch6: The Normal Distribution

Data Preprocessing. S1 Teknik Informatika Fakultas Teknologi Informasi Universitas Kristen Maranatha

Basic concepts and terms

Image restoration. Restoration: Enhancement:

Chapter 2 Modeling Distributions of Data

IT 403 Practice Problems (1-2) Answers

Visualizing and Exploring Data

A Bayesian approach to parameter estimation for kernel density estimation via transformations

Land Cover Stratified Accuracy Assessment For Digital Elevation Model derived from Airborne LIDAR Dade County, Florida

On Page Rank. 1 Introduction

CS 195-5: Machine Learning Problem Set 5

UNIT 1A EXPLORING UNIVARIATE DATA

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Interpolation by Spline Functions

CLAREMONT MCKENNA COLLEGE. Fletcher Jones Student Peer to Peer Technology Training Program. Basic Statistics using Stata

Louis Fourrier Fabien Gaie Thomas Rolf

Nonparametric Approaches to Regression

Math 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency

Today s Topics. Percentile ranks and percentiles. Standardized scores. Using standardized scores to estimate percentiles

Modelling and Quantitative Methods in Fisheries

This lesson is designed to improve students

Improved smoothing spline regression by combining estimates of dierent smoothness

TERTIARY INSTITUTIONS SERVICE CENTRE (Incorporated in Western Australia)

MATLAB Routines for Kernel Density Estimation and the Graphical Representation of Archaeological Data

CHAPTER 2 Modeling Distributions of Data

Pre-control and Some Simple Alternatives

To complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506.

Lab 3 (80 pts.) - Assessing the Normality of Data Objectives: Creating and Interpreting Normal Quantile Plots

EXST3201 Mousefeed01 Page 1

2016 Stat-Ease, Inc. & CAMO Software

CHAPTER 1. Introduction. Statistics: Statistics is the science of collecting, organizing, analyzing, presenting and interpreting data.

The Normal Distribution & z-scores

SAS (Statistical Analysis Software/System)

Chapter 6. THE NORMAL DISTRIBUTION

An Introduction to the Bootstrap

Section 4 Matching Estimator

University of Wisconsin-Madison Spring 2018 BMI/CS 776: Advanced Bioinformatics Homework #2

8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10

8/3/2017. Contour Assessment for Quality Assurance and Data Mining. Objective. Outline. Tom Purdie, PhD, MCCPM

Four equations are necessary to evaluate these coefficients. Eqn

050 0 N 03 BECABCDDDBDBCDBDBCDADDBACACBCCBAACEDEDBACBECCDDCEA

A Method for Comparing Multiple Regression Models

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

Feature Selection. CE-725: Statistical Pattern Recognition Sharif University of Technology Spring Soleymani

SLStats.notebook. January 12, Statistics:

Salford Systems Predictive Modeler Unsupervised Learning. Salford Systems

Measures of Dispersion

And the benefits are immediate minimal changes to the interface allow you and your teams to access these

An Investigation of Using a Polynomial Loglinear Model to Evaluate Differences in Test Score Distributions. Diane Talley

STATISTICS (STAT) Statistics (STAT) 1

Chapters 5-6: Statistical Inference Methods

Transcription:

Improving the Post-Smoothing of Test Norms with Kernel Smoothing Anli Lin Qing Yi Michael J. Young Pearson Paper presented at the Annual Meeting of National Council on Measurement in Education, May 1-3, 2010, Denver, CO.

Abstract The traditional methodology of post-smoothing to develop norms used on educational and clinic products is to hand-smooth the scale scores or their distributions. This approach is very subjective, difficult to replicate, and extremely labor intensive. In hand-smoothing, the scores or distributions are adjusted based on personal judgment. Different persons, or same person at different times, will make significantly different judgments. By contrast, the kernel smoothing method is a nonparametric approach, which is more flexible, less subjective, and easier to replicate. Kernel smoothing has a small standard error and works well for either large or small samples. The process also has the potential for being fully automated. The kernel smoothing method has been applied in test equating, and it has shown promising results. The purpose of the study is to examine the use of the kernel smoothing approach to improve the postsmoothing of test norms, specifically, removing reversals in CDF. 2

Introduction The traditional methodology to develop norms for educational productions consists of following steps: The first step is to obtain the scale score cumulative distribution functions (CDFs) of specific grades or age groups in a given content area for which the norms are being developed. This is followed by a pre-smoothing step that involves the fitting of third- or fourth-degree polynomials to the CDFs in order to smooth out the statistical fluctuations in the data. Descriptive statistics (typically the first four moments) for the original and fitted distributions are reviewed, along with tables of key percentile ranks. The final step consists of examining plots of the CDFs across grades or age groups and post-smoothing. One of purposes in post-smoothing is to remove reversals in distributions. This is important in order to maintain the internal consistency of the norm sets. For an internally consistent set of norms, we would expect that percentile rank for a given score will be higher at say, grade 4, and lower at grade 5. When a reversal occurs, this expectation is violated. Currently, the traditional approach to post-smoothing educational and clinic products is to hand-smooth the scale scores (SS) or their distributions. This approach is very subjective, difficult to replicate, and extremely labor intensive. In hand-smoothing, the scores or distributions are adjusted based on personal judgments. Different persons, or same person at different times, will make significantly different judgments. Depending on different productions and their purposes, the processes of post-smoothing are different. For example, in some intelligence tests, the processes of post-smoothing are to adjust SS against raw scores (RS) with theoretical distribution (say, normal); in other achievement tests, the processes of post-smoothing are to adjust the percentile rank against SS or RS. The second process of achievement tests was used in current study. The process is similar in the first situation of intelligence tests. In the contrast, the kernel smoothing method is a nonparametric approach, which is more flexible, less subjective, and easier to replicate. Kernel smoothing has a small standard error and works well for either large or small samples. The process also has the potential for being fully automated. The kernel smoothing method has been applied in test equating, and it has shown promising results (von Davier, Holland, and Thayer, 2004). The purpose of the study was to 3

examine the use of the kernel smoothing approach to improve the post-smoothing of test norms, specifically, remove reversals in CDF. Method By definition of the univariate kernel density estimator (Wand & Jones, 1995; p. 42), following formula is a kernel density function " 1 ( x; h) = ( nh)! n f ˆ K{( x " i ) / h}, (1) i= 1 where K satisfies! K ( x) dx =1. In equation (1), h is positive number and called bandwidth or smoothing parameter. When h is adjusted, the curve shape is changed. Then we could fit data or smoothing curve by choosing different values h values. Theoretically, we could choose any kernel density function. For this study we choose kernel cumulative distribution function described by von Davier, Holland, and Thayer (2004, p. 58), F hx (x) = Pr{(h x ) x} = j r j # (R j (x)) (2) ( h ) = a ( + h V ) + (1! a ) u (3) and R j (x) = x a x j (1 a )u /a h. (4) In Equation (2), (!) standards for the cumulative distribution function of the standard normal distribution, r j represents probability of each score of j, h x standards for bandwidth, or smoothing parameter, is test score, u represents score expectation, V is independent of and has the normal N (0,1) distribution, and a is defined by a 2 = 2 / 2 2 2 ( + h ), where is the variance of test score. For Equation (3), the following statements hold: lim ( h hx! 0 ) = (5) and 4

lim ( h ) =! V + µ. (6) hx# x ( h ) keeps the mean and standard deviation unchanged. From Equating (5), we know if we use a very small smoothing parameter (close to zero), h ) will be much close to. This means we make a very small change to the original score distribution. From Equation (6), we know if we use a very large smoothing parameter (usually, ten times larger than (! ), we will get a distribution close to a normal distribution with a mean of u and standard deviation of Therefore, if we use different values of smoothing parameter, we will get different smoothed continuing score distributions.!. The choice of the value of bandwidth h is more important than that of different kernel density function ((Wand & Jones, 1995; p. 13). However, sometimes we can not smooth data well by adjusting the value h. This makes sense because we can not adjust shape of the curve freely in whole score range by adjusting one parameter h. To increase the smoothing flexibility so that it can be used in the current study, the kernel smoothing function in Equation (2) needs to be revised based on the variable kernel density estimators (Wand & Jones, 1995; p. 42): ˆ f V x; n ( ) = n #1 { ( i )} #1 K{(x # i ) / ( i )} i=1. (7) In Equation (7), K (!) represents the kernel density and a ) is the smoothing parameter. The main feature of kernel smoothing is data speak for themselves (Wand & Jones, 1995; p. 3), which means that the data decide which function fits best. We need to look at the data obtained from different tests, then, we will decide if we have chosen enough parameters to smooth the distributions. We will increase other smoothing parameters if necessary. For example, we could increase parameters to adjust means, standard deviation or other score distribution moments. Theoretically, there are two criteria to evaluate the kernel smoothing results and in select bandwidth (von Davier, et al, 2004), namely, best fit and minimum number of modes. Best fit means that the smoothed CDFs should be as close to the original CDFs as possible; and minimum number of modes means that the smoothed CDFs should have as few modes as possible. The practical criteria for examining the results of kernel smoothing are: The lack of ( i 5

reversals and minimum change to moments (mean, standard deviation, skewness, kurtosis, etc.) in the kernel smoothed norm sets. The revised kernel smoothing function used in this study was programmed using the SAS programming language. (Note that the PROC KDE function included in the SAS software does not have the flexibility of adjusting the smoothing parameters or including other smoothing parameters.) User data from a norm-referenced test were used for the study. The SAS programs developed were used to prepare the scale score distributions, pre-smooth the distributions, produce the descriptive statistics and CDF plots, and conduct the kernel smoothing. Results Figure 1 presents the CDFs of scale scores for empirical data, which are unsmoothed and have reversals. For example, the curve of level 8 crosses over curves of levels 9 and 10 and there are reversals in low part of curves of levels 12 and 13. CDF 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 300 400 500 600 700 800 900 level 3 4 5 6 7 8 9 10 11 12 13 Figure 1. Empirical Data CDF Plots of Scaled Scores ss Figure 2 presents pre-smoothed CDF plots, which still have reversals. The logistic model was used for fitting data and first four moments (mean, standard deviation, skewness and 6

kurtosis) were kept. The purpose of pre-smooth is to remove random errors. Any smoothing process might induce system errors too. We expected that total (random plus system) errors should be less after smoothing. CDF 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 300 400 500 600 700 800 900 level 3 4 5 6 7 8 9 10 11 12 13 Figure 2. Pre-Smoothed CDFs Plots of Scaled Scores ss Figure 3 presents post smoothed CDF plots using kernel smoothing technology, which had smoothed curves and had no reversals. A practical issue that we should consider in removing reversal is: when two curves cross over, do we need to lift one curve, to lower the other or to do both? For example, in Figure 2, the curve of level 8 crosses over curves of levels 9 and 10. To remove the reversal, we could take the approaches of (a). lifting level 8 curve; (b). lowering the levels 9 and 10 curves; (c). Conducting both (a) and (b). In the current study, we took the approach (c). We lifted the upper part of level 8 curve and lowered the upper part of level 9 and 10 curves. 7

CDF 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 300 400 500 600 700 800 900 level 3 4 5 6 7 8 9 10 11 12 13 Figure 3. Post-Smoothed CDF Plots of Scale Scores with Kernel Smoothing ss Table 1 presents a comparison of moments between empirical data and smoothed data. We could see that means and standard deviations are much closed between unsmoothed and smoothed data with an exception of level 8, which has serious reversals with levels 9 and 10. When we remove the reversal by lifting upper part of the curve in level 8, the standard deviation goes down. There are minimal changes of skewness and kurtosis in most of curves. 8

Table 1. Moments Comparison between the Empirical Data and Smoothed Data Level N Data Mean STD Skewness Kurtosis 3 45,896 Empirical 567.455 43.252-0.21504-0.21384 Smoothed 567.234 42.602-0.25224-0.22710 4 42,317 Empirical 611.664 41.970-0.07297-0.03066 Smoothed 611.781 41.508-0.11169-0.07121 5 41,018 Empirical 632.016 39.313-0.03699-0.04128 Smoothed 632.455 39.219-0.04998-0.07179 6 43,271 Empirical 639.985 38.474 0.11664-0.06873 Smoothed 640.448 38.412 0.10484-0.09685 7 42,423 Empirical 656.160 34.180 0.05560 0.07375 Smoothed 656.645 34.153 0.04837 0.05176 8 41,998 Empirical 671.463 40.561 0.22591-0.00054 Smoothed 670.715 33.402 0.17686-0.04923 9 39,110 Empirical 673.866 31.937-0.06411 0.04721 Smoothed 674.362 33.010-0.06444 0.08242 10 36939 Empirical 681.053 32.727-0.02112-0.13725 Smoothed 681.547 32.717-0.02332 0.04591 11 26,458 Empirical 686.847 32.265 0.01191 0.01558 Smoothed 687.341 32.253 0.00839 0.13545 12 20,942 Empirical 694.342 33.620-0.10517-0.03900 Smoothed 694.840 33.615-0.10416 0.06529 13 20,962 Empirical 703.788 39.589 0.11549-0.01817 Smoothed 704.249 39.520 0.10294-0.05056 Discussion As we mentioned above, hand-smoothing is very subjective, difficult to replicate, and extremely labor intensive. With the Kernel smoothing method, we can obtain smoothing results more objectively, and easily document the smoothing parameters used in order to replicate the process. Considering that the typical multi-level achievement test series can require the production of dozens of norm sets, such an improvement in the process could yield considerable savings in labor costs while producing a better end product. There are only 11 levels or groups in current example. While there are reversals in the data, they are not very serious. In some test norms, there might be about 50 groups (grades, levels or ages). The curves could be interwoven seriously, which makes hand smoothing very hard. A practical strategy under this situation with hand smoothing is: (a). Smooth in one curve (grade, level or age); (b). Copy the smoothed curve to other grade, level, or age; (c). Manipulate (shift or minor adjust) copied curve to meet criteria; (d). Repeat steps (b) and (c) until finishing 9

all curves smoothing. The approach makes all the norm curves look same or similar and loose the initial characteristics of the curves. Kernel smoothing technology could avoid this issue and make creating norm curves more reasonable. Although we could see some reversals in curves, we could see the means in empirical data are in order, or monotonically increasing with levels increasing. This means that even reversals happen among curves, but not in empirical means. Under some circumstance, reversals might happen in means. For example, the mean of level 13 might be smaller than that of level 12. Then we need to adjust mean before doing post-smoothing. Parents, teachers, principals, and district-level administrators may consider normreferenced tests for providing information that is complementary to that of their state testing programs. Whereas NCLB tests have focused on criterion-referenced inferences, normreferenced inferences have been integral to most test publishers catalog educational products. Our use of the kernel-smoothing approach takes an accepted statistical procedure that is being used in the testing community for test equating and extends it to deal with an important and practical measurement issue: The creation of accurate and informative norm-referenced testing information. References von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York: Springer. Wand, M. P., & Jones, M. C. (1995). Kernel smoothing. New York: Chapman & Hall/CRC. 10