Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex

Similar documents
SAS Publishing SAS. Forecast Studio 1.4. User s Guide

Penetrating the Matrix Justin Z. Smith, William Gui Zupko II, U.S. Census Bureau, Suitland, MD

Key Terms and Concepts. Introduction

The Time Series Forecasting System Charles Hallahan, Economic Research Service/USDA, Washington, DC

RLMYPRINT.COM 30-DAY FREE NO-OBLIGATION TRIAL OF RANDOM LENGTHS MY PRINT.

J.D. Power and Associates Reports: Social Networking and Gaming Applications Driving Smartphone Usage and Revenue

Nigerian Telecommunications (Services) Sector Report Q2 2016

Nigerian Telecommunications Sector

QUARTERLY FORECAST REPORT 1ST QUARTER

MINI-PAPER A Gentle Introduction to the Analysis of Sequential Data

Data Analysis: Displaying Data - Deception with Graphs

SAS System Powers Web Measurement Solution at U S WEST

Nigerian Telecommunications (Services) Sector Report Q3 2016

Appendix A: Graph Types Available in OBIEE

How to use FSBForecast Excel add-in for regression analysis (July 2012 version)

Missing Data: What Are You Missing?

MIS2502: Data Analytics Dimensional Data Modeling. Jing Gong

All King County Summary Report

Time series. VCEcoverage. Area of study. Units 3 & 4 Data analysis

Practical 2: Using Minitab (not assessed, for practice only!)

Aaron Daniel Chia Huang Licai Huang Medhavi Sikaria Signal Processing: Forecasting and Modeling

Counting daily bridge users

Maryland Corn: Historical Basis and Price Information Fact Sheet 495

Seattle (NWMLS Areas: 140, 380, 385, 390, 700, 701, 705, 710) Summary

The Vision Council Winds of Change

13. Geospatio-temporal Data Analytics. Jacobs University Visualization and Computer Graphics Lab

Maryland Corn: Historical Basis and Price Information Fact Sheet 495

InterAct User. InterAct User (01/2011) Page 1

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Contents:

ClimDex - Version 1.3. User's Guide

B.2 Measures of Central Tendency and Dispersion

Understanding and Comparing Distributions. Chapter 4

Scholz, Hill and Rambaldi: Weekly Hedonic House Price Indexes Discussion

Monitor Qlik Sense sites. Qlik Sense Copyright QlikTech International AB. All rights reserved.

The Total Network Volume chart shows the total traffic volume for the group of elements in the report.

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects

Midwest ISO Overview - ATC Customer Meeting. February 26, 2009

How App Ratings and Reviews Impact Rank on Google Play and the App Store

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by

REPORT ON TELECOMMUNICATIONS SERVICE QUALITY WINDSTREAM FLORIDA, INC.

Describing Data: Frequency Tables, Frequency Distributions, and Graphic Presentation

1Q17 RESULTS M AY / 2017

J.D. Power and Associates Reports: Rising Network Quality Issues Prompt a Higher Number of Calls among Wireless Business Customers To Report Problems

Summary Statistics. Closed Sales. Paid in Cash. New Pending Sales. New Listings. Median Sale Price. Average Sale Price. Median Days on Market

99 International Journal of Engineering, Science and Mathematics

How to use FSBforecast Excel add in for regression analysis

1.2 Characteristics of Function Graphs

A Project Network Protocol Based on Reliability Theory

Seattle (NWMLS Areas: 140, 380, 385, 390, 700, 701, 705, 710) Summary

Q: Which month has the lowest sale? Answer: Q:There are three consecutive months for which sale grow. What are they? Answer: Q: Which month

Catalyst Views. 1. Open your web browser and type in the URL address line:

Seattle (NWMLS Areas: 140, 380, 385, 390, 700, 701, 705, 710) Summary

Simulation Study: Introduction of Imputation. Methods for Missing Data in Longitudinal Analysis

Needs Driven Workflow Design

The Consequences of Poor Data Quality on Model Accuracy

Regional Group Nordic GRID DISTURBANCE AND FAULT STATISTICS

EPS GROWTH MEASURES FOR INCENTIVE PAYMENTS WHAT ARE THEY REALLY REWARDING?

Maryland Soybeans: Historical Basis and Price Information

Graphical Techniques for Displaying Multivariate Data

3.1 Graphing Using the First Derivative

Quantitative Vulnerability Assessment of Systems Software

INTERTANKO Vetting seminar 24 th October 2017

SAS IT Resource Management Forecasting. Setup Specification Document. A SAS White Paper

Airside Congestion. Airside Congestion

Speed Dating: Looping Through a Table Using Dates

Sales Drop, Inventories Fell, Prices Climb

J.D. Power and Associates Reports: Wireless Service Providers in Canada Make Substantial Customer Satisfaction Improvements

Verizon Wireless Ranks Highest in Wireless Network Quality Performance in Five Regions; U.S. Cellular Ranks Highest in One Region

How to Conduct a Heuristic Evaluation

DATA MINING AND WAREHOUSING

MIS2502: Data Analytics The Information Architecture of an Organization. Jing Gong

Statistics, Data Analysis & Econometrics

Economic Outlook. William Strauss Senior Economist and Economic Advisor Federal Reserve Bank of Chicago

Box Plots. OpenStax College

Sony Ericsson continues to invest for future growth

Traveler s Path to Purchase

Economic Update German American Chamber of Commerce

MATH 117 Statistical Methods for Management I Chapter Two

PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA

Statistical Methods in Trending. Ron Spivey RETIRED Associate Director Global Complaints Tending Alcon Laboratories

Application of Clustering Techniques to Energy Data to Enhance Analysts Productivity

US Consumer Device Preference Report:

PRESS RELEASE January 16, 2009 Sony Ericsson reports results for fourth quarter and full year 2008

'Numerical Reasoning' Results For JOE BLOGGS. Numerical Reasoning. Summary

Traffic Trend 4.1 General 4.2 Communication traffic growth and traffic nature trend Communication traffic growth

Section 2.1 Graphs. The Coordinate Plane

Non Linear Calibration Curve and Polynomial

Survey: Users Share Their Storage Performance Needs. Jim Handy, Objective Analysis Thomas Coughlin, PhD, Coughlin Associates

Federal Reserve Bank of Chicago 22 nd Annual Economic Outlook Symposium. Automotive Outlook. Will Shearin December 5, 2008

MONITORING REPORT ON THE WEBSITE OF THE STATISTICAL SERVICE OF CYPRUS DECEMBER The report is issued by the.

An Optimal Battery Energy Storage Charge/Discharge Method

Question Bank. 4) It is the source of information later delivered to data marts.

MIS2502: Data Analytics Dimensional Data Modeling. Jing Gong

J.D. Power and Associates Reports: Overall Wireless Network Problem Rates Differ Considerably Based on Type of Usage Activity

Oxford Scientific Software Ltd

SAS Scalable Performance Data Server 4.3

ANNUAL REPORT OF HAIL STUDIES NEIL G, TOWERY AND RAND I OLSON. Report of Research Conducted. 15 May May For. The Country Companies

Developing a Dashboard to Aid in Effective Project Management

Artifact Calibration and Its Use in Calibrating 8.5 Digit DMMs

Transcription:

Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex Keiko I. Powers, Ph.D., J. D. Power and Associates, Westlake Village, CA ABSTRACT Discrete time series modeling, such as ARIMA, transfer function modeling, etc, assumes that the data are based on the equally spaced intervals of time and no missing data. In the case of missing data, data imputation has to be performed first before time series modeling, and PROC EXPAND of SAS/ETS allows interpolation of time series data. Using the automobile point-of-sales data, the present study investigates if and how PROC EXPAND data interpolation affects time series modeling, when the data are volatile (e.g., vehicle sales price) or complex (e.g., sales volumes, which exhibit seasonality with respect to week, month, and year). Just as with the study by Powers (2004) that investigated another PROC EXPAND function, i.e., frequency conversion, it is important to know the existence and nature of effects of missing data imputation when developing and interpreting time series models.. INTRODUCTION SAS/ETS, PROC EXPAND offers various data-management functions specifically for time series data that are useful for time series modelers. Knowing how these functions perform in various settings allow us to perform PROC EXPAND data management steps more efficiently. Powers (2004) investigated the frequency conversion feature of PROC EXPAND and recommended that the high-to-low frequency conversion should be handled with caution when the time series data contain seasonal components. Another key feature of PROC EXPAND is missing data interpolation. Because commonly-used time series approaches, such as the Box-Jenkins ARIMA or autoregression model, require data with no missing values, performing missing data interpolation with a statistically sound method is particularly important. In the general statistics domain, there is much literature on advanced approaches to missing data (e.g., Johnson, and Davis, 1998; Little and Rubin, 2002; Marsh, 2000; Yuan, 2000). There are new SAS procedures, PROC MI and PROC MIANALYZE, that handles some of the complex approaches (Yuan, 2000). When time series data are involved, special considerations should be given for missing data imputation, due to its time-related dependency. The illustration below highlights its unique property. In this example that shows the weekly time series of the average vehicle price, we have a missing data point, and missing data imputation was performed by a simple mean replacement using the historical mean. It is clear that this procedure is introducing an outlier data point. Since the data display a somewhat step-wise pattern, it would be more appropriate to use a local mean based on adjacent values rather than the global historical mean. Obviously, when the time series data are less volatile and do not display an upward or downward trend over time, missing data replacement by the historical mean would suffice, but special caution is needed when dealing with volatile or complex time series data. The purpose of the present study is to investigate how PROC EXPAND missing data interpolation performs with volatile or complex time series data. The analysis results help us better understand how to handle missing data for various time series modeling approaches. 1

Illustration of how historical mean replacement could create a problem in time series data $24,000 $23,000 $22,000 Price ($) $21,000 $20,000 Missing data was replaced by historical mean here $19,000 $18,000 1/2/2001 3/2/2001 5/2/2001 7/2/2001 9/2/2001 11/2/2001 1/2/2002 3/2/2002 5/2/2002 7/2/2002 9/2/2002 11/2/2002 1/2/2003 3/2/2003 5/2/2003 7/2/2003 9/2/2003 11/2/2003 1/2/2004 Time Period (by week) DATA AND PROCEDURE Power Information Network (PIN), a division of J. D. Power and Associates, has been collecting and archiving pointof-sale data from automobile retailers in the United States for over a decade. The sales data are electronically sent to the PIN computer system daily. The data consist of new and used vehicles purchased by consumers and cover 26 key markets (e.g., New York, Florida, Detroit/Chicago area, Texas, California, etc) with over 6,000 retailers currently providing data to PIN. The total cumulative number of transactions now exceeds 17 million records. The transaction data can be aggregated and summarized to produce historical data, which allow us to analyze key market indicators to monitor over time the automobile market in the USA. These indicators include various features, such as vehicle price, profit margin, consumer incentives, or days to turn (i.e., how many days it takes to sell). For the present study, effects of missing data interpolation on univariate time series modeling were investigated by simulating missing data situations with different numbers of consecutive missing data points introduced to originally complete time series variables. The number of missing observations was increased up to 10 percent of the data; for example, the weekly time series variables had 210 observations, and therefore scenarios with up to 20 missing observations were simulated. Three time series variables were studied with the following setups: (1) Days to Turn (DTT) time series data: displaying a volatile pattern a. Time series frequency: weekly b. Missing data: at two time periods stable period and volatile period c. Missing data duration: # of consecutive weeks= 4, 8, 12, 16, 20 d. Imputation method: (1) join (linear) and (2) spline (cubic) 2

(2) Sales Unit (SALE-W) time series data: displaying a complex seasonality pattern a. Time series frequency: weekly b. Missing data: at one time period starting on January 2002 c. Missing data duration: # of consecutive weeks = 4, 8, 12, 16, 20 d. Imputation method: (1) join (linear) and (2) spline (cubic) (3) Sales Unit (SALE-M) time series data: displaying a complex seasonality pattern a. Time series frequency: monthly b. Missing data: at one time period starting on January 2002 c. Missing data duration: # of consecutive months = 1, 2, 3, 4, 5 d. Imputation method: (1) join (linear) and (2) spline (cubic) Days to Turn (DTT) refers to the number of days it takes an automobile dealer to sell a vehicle, and Figure 1 shows the weekly mean DTT for a mid-size car from January 2000 through December 2003. As can be seen in Figure 1, the data are fairly volatile, displaying three clear hills. The pattern shows considerable variations from time to time due to an introduction of the new model year. The missing data simulation is conducted with two starting points for this variable from May 2000 and from September 2001 in order to assess possible differential effects of missing imputation between when the data are stable versus volatile. Sales Unit data refer to the number of vehicles sold by the PIN participating dealers. Two time-series variables, one weekly (SALE-W) and the other monthly (SALE-M) (see Figures 2 and 3, respectively), are prepared, both of which clearly show the existence of seasonality. For each setup, the missing data imputation pattern is first investigated by visual inspection of time series plots, and the effects of missing data imputation on ARIMA model specifications are compared for various setups stated above. It should be noted that the number of data points for the monthly SALE variable is 48, which is less than the number of data points recommended for univariate time series analysis. This setup is included to illustrate the impact of missing data when the number of data points is not ideal. Figure 1. Weekly Time Series of Days to Turn 60 50 40 # of Days 30 20 10 0 1/2/2001 3/2/2001 5/2/2001 7/2/2001 9/2/2001 11/2/2001 1/2/2002 3/2/2002 5/2/2002 7/2/2002 9/2/2002 11/2/2002 1/2/2003 3/2/2003 5/2/2003 7/2/2003 9/2/2003 11/2/2003 1/2/2004 Time Period (by week) 3

Figure 2. Weekly Time Series of # of Units Sold 3000 2500 2000 # of Units 1500 1000 500 0 1/2/2001 3/2/2001 5/2/2001 7/2/2001 9/2/2001 11/2/2001 1/2/2002 3/2/2002 5/2/2002 7/2/2002 9/2/2002 11/2/2002 1/2/2003 3/2/2003 5/2/2003 7/2/2003 9/2/2003 11/2/2003 1/2/2004 Time Period (Weekly) Figure 3. Monthly Time Series of # of Units Sold 9000 8000 7000 6000 # of Units 5000 4000 3000 2000 1000 0 Jan-00 Mar-00 May-00 Jul-00 Sep-00 Nov-00 Jan-01 Mar-01 May-01 Jul-01 Sep-01 Nov-01 Jan-02 Mar-02 May-02 Jul-02 Sep-02 Nov-02 Jan-03 Mar-03 May-03 Jul-03 Sep-03 Nov-03 Time Period (Monthly) 4

For performing missing data interpolation, the SAS codes below were used: /* Days to Turn: missing data interpolation with method = spline */ proc expand data=d2 out=out1 from=week; id enddat; convert daystotu; run; /* Days to Turn: missing data interpolation with method = join */ proc expand data=d2 out=out1 from=week; id enddat; convert daystotu / method=join; run; The default method, SPLINE, fits a third-degree polynomial function for data interpolation, whereas the JOIN method is based on a linear function. These derived variables are then modeled with PROC ARIMA, and the resultant ARIMA specifications are compared against those based on the corresponding original variables. RESULTS AND DISCUSSION First, to visually inspect the performance of PROC EXPAND derived variables, these variables were plotted with the original variables. Figure 4. shows missing data interpolation results for Days to Turn variable, using the SPLINE and JOIN methods, when the numbers of missing data points are 4, 8, 12, 16, 20 weeks. The starting time point is May 2000 for these charts, or from the section of the time series where the data dynamics is fairly stable. Figure 4. Visual Inspection of Missing Data Interpolation of Days To Turn Variable with SPLINE and JOIN Methods Interpolation with SPLINE Method Interpolation with JOIN Method 60 55 50 4 weeks 16 weeks 20 weeks 60 55 50 Original 4 weeks 8 weeks 20 weeks 16 weeks # of Days 45 40 35 Original 8 weeks 12 weeks # of Days 45 40 35 12 weeks 30 30 25 25 20 2/2/2000 4/2/2000 6/2/2000 8/2/2000 10/2/2000 12/2/2000 20 2/2/2000 4/2/2000 6/2/2000 8/2/2000 10/2/2000 12/2/2000 Time Period Time Period In general, observed differences in the interpolated values are small between the two methods when the time series variable exhibits a smoother pattern. Compared to Days to Turn, the weekly variable for sales unit (SALE-W) showed a substantially different interpolation results between SPLINE and JOIN methods. The SPLINE method tends to exaggerate the upward or downward pattern, and when the time series variable consists of a series of spikes (e.g., weekly series for Sales Unit) the interpolated values could get extremely high or low. For example, the missing interpolation for the number of missing weeks being 12 and 16 resulted in considerably different patterns for this variable. 5

Next, the performance of PROC EXPAND data interpolation was investigated with Box- Jenkins ARIMA modeling. ARIMA models were developed for each of the three variables (weekly DTT, weekly SALE, and monthly SALE) using the original non-missing data. These model specifications were then applied to the set of the interpolated data to test if the original model specification will be recovered. The goodness of fit criteria (i.e., white noise tests with Q statistics, significance tests for parameter estimates) were used for this purpose. For example, the ARIMA model based on the original non-missing DTT was ARIMA (1,1,0). The same model specification was examined with data having varying numbers of missing data points from 4 to 20. The analysis results in Table 1 are based on the SPINE method. Since the ARIMA results were similar between the SPINE and JOIN methods, only the outcomes with the former approach are presented. The summary table indicates that the model specification for DTT was not affected by the interpolation procedure even when the number of missing data points was as high as 12, whether the missing data time period was for May 2000 or for September 2001. On the other hand, the two SALE variables displayed quite different patterns between the weekly series and monthly series. While the monthly series did not get affected by the missing data points, the weekly series did very poorly even when the number of missing data points was only 4 (or less than 2% of the 210 data points in the weekly SALE series). The poor results by the weekly SALE variable might be due to the spiky pattern of this variable. On the other hand, the short time series, SALE-M, with 48 observations, did not display any negative effects from the data interpolation despite its small number of observations. Table 1. ARIMA Results Based on Different Number of Consecutive Missing Data Points (1) Days to Turn (DTT) weekly time series variable # of consecutive missings ARIMA specification Parameter Estimates Q statistics on residuals 0: original series - no missing (1,1,0) (1- B) X t = 0.061 + (1 + 0.244 B) e t-1 Lag 6: n.s. Lag 12: n.s. Missing starts at May 2000 4 (1,1,0) (1- B) X t = 0.061 + (1 + 0.203 B) e t-1 Lag 6: n.s. Lag 12: n.s. 8 (1,1,0) (1- B) X t = 0.061 + (1 + 0.205 B) e t-1 Lag 6: n.s. Lag 12: n.s. 12 (1,1,0) (1- B) X t = 0.061 + (1 + 0.160 B) e t-1 Lag 6: n.s. Lag 12: n.s. 16 (1,1,0) no estimates due to Q stat results Lag 6: p<.05 Lag 12: n.s. 20 (1,1,0) no estimates due to Q stat results Lag 6: p<.05 Lag 12: n.s. Missing starts at Sep 2001 4 (1,1,0) (1- B) X t = 0.061 + (1 + 0.233 B) e t-1 Lag 6: n.s. Lag 12: n.s. 8 (1,1,0) (1- B) X t = 0.061 + (1 + 0.234 B) e t-1 Lag 6: n.s. Lag 12: n.s. 12 (1,1,0) (1- B) X t = 0.061 + (1 + 0.209 B) e t-1 Lag 6: n.s. Lag 12: n.s. 16 (1,1,0) (1- B) X t = 0.061 + (1 + 0.226 B) e t-1 Lag 6: n.s. Lag 12: n.s. 20 (1,1,0) no estimates due to Q stat results Lag 6: p<.01 Lag 12: p<.05 (2) Sales Unit (SALE-W) weekly time series variable # of consecutive missings ARIMA specification Parameter Estimates Q statistics on residuals 0: original series - no missing (1,0,0)(0,1,0) 4 (1- B 4 ) X t = 7.585 + (1-0.253 B) e t-1 Lag 6: n.s. Lag 12: n.s. 4 (1,0,0)(0,1,0) 4 no estimates due to Q stat results Lag 6: p<.05 Lag 12: p<.01 8 (1,0,0)(0,1,0) 4 no estimates due to Q stat results Lag 6: p<.01 Lag 12: p<.01 12 (1,0,0)(0,1,0) 4 no estimates due to Q stat results Lag 6: p<.01 Lag 12: p<.01 16 (1,0,0)(0,1,0) 4 no estimates due to Q stat results Lag 6: p<.01 Lag 12: p<.01 20 (1,0,0)(0,1,0) 4 no estimates due to Q stat results Lag 6: p<.01 Lag 12: p<.01 (3) Sales Unit (SALE-M) monthly time series variable # of consecutive missings ARIMA specification Parameter Estimates Q statistics on residuals 0: original series - no missing (2,0,0)(0,1,0) 12 X t = 5035 + (1-0.453 B -0.329B 2 )(1-0.864B 12 ) e t-1 Lag 6: n.s. Lag 12: n.s. 1 (2,0,0)(0,1,0) 12 X t = 5037 + (1-0.454 B -0.329B 2 )(1-0.863B 12 ) e t-1 Lag 6: n.s. Lag 12: n.s. 2 (2,0,0)(0,1,0) 12 X t = 5055 + (1-0.444 B -0.332B 2 )(1-0.855B 12 ) e t-1 Lag 6: n.s. Lag 12: n.s. 3 (2,0,0)(0,1,0) 12 X t = 5096 + (1-0.459 B -0.304B 2 )(1-0.847B 12 ) e t-1 Lag 6: n.s. Lag 12: n.s. 4 (2,0,0)(0,1,0) 12 X t = 5096 + (1-0.462 B -0.302B 2 )(1-0.845B 12 ) e t-1 Lag 6: n.s. Lag 12: n.s. 5 (2,0,0)(0,1,0) 12 X t = 5109 + (1-0.472 B -0.292B 2 )(1-0.832B 12 ) e t-1 Lag 6: n.s. Lag 12: n.s. Overall, the results indicated that the PROC EXPAND interpolated data perform well with ARIMA models for time series with smooth movements, such as DTT, even when the number of consecutive missing data points is fairly high. 6

On the other hand, when the data are spiky, as has been observed with the weekly SALE variable, the original ARIMA model specification fails to stand even when the number of missing observations is rather small. A possible reason is that the big zigzag movements between adjacent data points, present in the weekly SALE variable, are not captured and recovered well when missing data interpolation is performed. Therefore, when performing missing data interpolation to time series data with volatile or complex patterns, it is very important to first study the data dynamics carefully to make sure that missing data interpolation does not introduce data points that do not fit the overall trend or the cyclic pattern of the time series data. CONCLUSIONS Missing data imputation for time series variables requires special considerations because of the time-related dependency among observations. PROC EXPAND offers options for missing data interpolation, and understanding how it performs with various data patterns is important for applied time series modelers. For example, if there are a few missing observations in the middle of time series data, it is not wise to discard the portion before the missing data period, because many time series approaches require a fairly large number of observations At the same time, discrete time series approaches assume no missing time periods, and for this reason, it is valuable to have an approach that generates sound missing data imputation outcomes. Overall, the results from this study showed that PROC EXPAND is a valuable tool for this purpose, displaying minor impacts due to consecutive missing data points up to 10 percentages of the total number of observations. This is particularly true when the time series data display a smooth pattern with small adjacent point-to-point movements. On the other hand, when the data are complex and volatile with sharp increases/decreases between adjacent observations, visual inspection of the time series plot is highly recommended. REFERENCES Johnson, M.J. and Davis. P. (1998). The Effect of Missing Data on Sample Sizes for Repeated Measures Models. Proceedings of the Twenty-Third Annual SAS Users Group International Conference, Paper 231. Little R.J.A. and Rubin, D.B. (2002). Statistical Analysis with Missing Data, 2nd Ed. New York: John Wiley & Sons, Inc. Marsh, L.C. (2000). Correcting for Missing Discrete Responses in Business Surveys. Proceedings of the Twenty- Fifth Annual SAS Users Group International Conference, Paper 269-25. Powers, K.I. (2004). Empirical Investigation of Time Series Frequency Conversion with PROC EXPAND. Proceedings of the Western Users of SAS Software: 12th Annual Conference in Pasadena, CA. Yuan, Y.C. (2000). Multiple Imputation for Missing Data: Concepts and New Development. Proceedings of the Twenty-Fifth Annual SAS Users Group International Conference, Paper 267-25. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Keiko I. Powers, Ph.D. J. D. Power and Associates 2625 Townsgate Road Westlake Village, CA 91361 Work Phone: 805-418-8114 Fax: 805-418-8241 E-mail Address: Keiko.Powers@jdpa.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 7