Adjustment methods for differential measurement errors in multimode surveys

Similar documents
Simulation: Solving Dynamic Models ABE 5646 Week 11 Chapter 2, Spring 2010

CS 534: Computer Vision Model Fitting

S1 Note. Basis functions.

This module is part of the. Memobust Handbook. on Methodology of Modern Business Statistics

A Semi-parametric Regression Model to Estimate Variability of NO 2

Air Transport Demand. Ta-Hui Yang Associate Professor Department of Logistics Management National Kaohsiung First Univ. of Sci. & Tech.

Multilevel Analysis with Informative Weights

Biostatistics 615/815

Econometrics 2. Panel Data Methods. Advanced Panel Data Methods I

Unsupervised Learning

Some Advanced SPC Tools 1. Cumulative Sum Control (Cusum) Chart For the data shown in Table 9-1, the x chart can be generated.

Feature Reduction and Selection

Bayesian inference for sample surveys

Empirical Distributions of Parameter Estimates. in Binary Logistic Regression Using Bootstrap

Performance Evaluation of Information Retrieval Systems

Anonymisation of Public Use Data Sets

Smoothing Spline ANOVA for variable screening

X- Chart Using ANOM Approach

y and the total sum of

Outline. Type of Machine Learning. Examples of Application. Unsupervised Learning

Type-2 Fuzzy Non-uniform Rational B-spline Model with Type-2 Fuzzy Data

Available online at ScienceDirect. Procedia Environmental Sciences 26 (2015 )

NAG Fortran Library Chapter Introduction. G10 Smoothing in Statistics

WEI: Information needs for further analyses

Proper Choice of Data Used for the Estimation of Datum Transformation Parameters

An Iterative Solution Approach to Process Plant Layout using Mixed Integer Optimisation

C2 Training: June 8 9, Combining effect sizes across studies. Create a set of independent effect sizes. Introduction to meta-analysis

7/12/2016. GROUP ANALYSIS Martin M. Monti UCLA Psychology AGGREGATING MULTIPLE SUBJECTS VARIANCE AT THE GROUP LEVEL

Wishing you all a Total Quality New Year!

Review of approximation techniques

Hermite Splines in Lie Groups as Products of Geodesics

A Bootstrap Approach to Robust Regression

Design of Georeference-Based Emission Activity Modeling System (G-BEAMS) for Japanese Emission Inventory Management

Support Vector Machines

Monte Carlo Rendering

APPLICATION OF MULTIVARIATE LOSS FUNCTION FOR ASSESSMENT OF THE QUALITY OF TECHNOLOGICAL PROCESS MANAGEMENT

EECS 730 Introduction to Bioinformatics Sequence Alignment. Luke Huan Electrical Engineering and Computer Science

Variance estimation in EU-SILC survey

Using Auxiliary Data for Adjustment In Longitudinal Research. Dirk Sikkel Joop Hox Edith de Leeuw

Network Coding as a Dynamical System

A Statistical Model Selection Strategy Applied to Neural Networks

BioTechnology. An Indian Journal FULL PAPER. Trade Science Inc.

An Indian Journal FULL PAPER ABSTRACT KEYWORDS. Trade Science Inc.

Classifier Selection Based on Data Complexity Measures *

Helsinki University Of Technology, Systems Analysis Laboratory Mat Independent research projects in applied mathematics (3 cr)

SVM-based Learning for Multiple Model Estimation

Fuzzy Logic Based RS Image Classification Using Maximum Likelihood and Mahalanobis Distance Classifiers

Three supervised learning methods on pen digits character recognition dataset

Term Weighting Classification System Using the Chi-square Statistic for the Classification Subtask at NTCIR-6 Patent Retrieval Task

Appendices to accompany. Demand for Health Risk Reductions: A cross-national comparison between the U.S. and Canada

FITTING A CHI -square CURVE TO AN OBSERVI:D FREQUENCY DISTRIBUTION By w. T Federer BU-14-M Jan. 17, 1951

A CLASS OF TRANSFORMED EFFICIENT RATIO ESTIMATORS OF FINITE POPULATION MEAN. Department of Statistics, Islamia College, Peshawar, Pakistan 2

Data Mining: Model Evaluation

An Internal Clustering Validation Index for Boolean Data

Implementation Naïve Bayes Algorithm for Student Classification Based on Graduation Status

Determining the Optimal Bandwidth Based on Multi-criterion Fusion

Reducing Frame Rate for Object Tracking

SURFACE PROFILE EVALUATION BY FRACTAL DIMENSION AND STATISTIC TOOLS USING MATLAB

A Similarity-Based Prognostics Approach for Remaining Useful Life Estimation of Engineered Systems

The Man-hour Estimation Models & Its Comparison of Interim Products Assembly for Shipbuilding

Meta-heuristics for Multidimensional Knapsack Problems

Solving Mixed Integer Formulation of the KS Maximization Problem Dual Based Methods and Results from Large Practical Problems

Nondestructive and intuitive determination of circadian chlorophyll rhythms in soybean leaves using multispectral imaging

UB at GeoCLEF Department of Geography Abstract

Why visualisation? IRDS: Visualization. Univariate data. Visualisations that we won t be interested in. Graphics provide little additional information

Some material adapted from Mohamed Younis, UMBC CMSC 611 Spr 2003 course slides Some material adapted from Hennessy & Patterson / 2003 Elsevier

Robust Mean Squared Error Estimation for ELL-based Poverty Estimates under Heteroskedasticity - An Application to Poverty Estimation in Bangladesh

FEATURE EXTRACTION. Dr. K.Vijayarekha. Associate Dean School of Electrical and Electronics Engineering SASTRA University, Thanjavur

Optimization Methods: Integer Programming Integer Linear Programming 1. Module 7 Lecture Notes 1. Integer Linear Programming

Finite Population Small Area Interval Estimation

Unsupervised Learning and Clustering

Solving two-person zero-sum game by Matlab

3D FCRM MODELING IN MILES PER GALLON OF CAR

Programming in Fortran 90 : 2017/2018

Parameter estimation for incomplete bivariate longitudinal data in clinical trials

Dealing with small samples and dimensionality issues in data envelopment analysis

USING LINEAR REGRESSION FOR THE AUTOMATION OF SUPERVISED CLASSIFICATION IN MULTITEMPORAL IMAGES

An inverse problem solution for post-processing of PIV data

Cluster Analysis of Electrical Behavior

Backpropagation: In Search of Performance Parameters

Lecture 5: Multilayer Perceptrons

Statistics and Data Analysis. Use of the ROC Curve and the Bootstrap in Comparing Weighted Logistic Regression Models

Learning the Kernel Parameters in Kernel Minimum Distance Classifier

CONCURRENT OPTIMIZATION OF MULTI RESPONCE QUALITY CHARACTERISTICS BASED ON TAGUCHI METHOD. Ümit Terzi*, Kasım Baynal

Generalized Team Draft Interleaving

Tsinghua University at TAC 2009: Summarizing Multi-documents by Information Distance

A Robust Method for Estimating the Fundamental Matrix

EXTENDED BIC CRITERION FOR MODEL SELECTION

Concurrent Apriori Data Mining Algorithms

Data Representation in Digital Design, a Single Conversion Equation and a Formal Languages Approach

Nonlinear Mixed Model Methods and Prediction Procedures Demonstrated on a Volume-Age Model

DIRECT SENSOR-ORIENTED CALIBRATION OF THE PROJECTOR IN CODED STRUCTURED LIGHT SYSTEM

ON SOME ENTERTAINING APPLICATIONS OF THE CONCEPT OF SET IN COMPUTER SCIENCE COURSE

Synthesizer 1.0. User s Guide. A Varying Coefficient Meta. nalytic Tool. Z. Krizan Employing Microsoft Excel 2007

NUMERICAL SOLVING OPTIMAL CONTROL PROBLEMS BY THE METHOD OF VARIATIONS

Efficiency Comparison of Data Mining Techniques for Missing-Value Imputation

Help for Time-Resolved Analysis TRI2 version 2.4 P Barber,

Improved Methods for Lithography Model Calibration

Complex Numbers. Now we also saw that if a and b were both positive then ab = a b. For a second let s forget that restriction and do the following.

The Research of Support Vector Machine in Agricultural Data Classification

Transcription:

Adjustment methods for dfferental measurement errors n multmode surveys Salah Merad UK Offce for Natonal Statstcs ESSnet MM DCSS, Fnal Meetng Wesbaden, Germany, 4-5 September 2014

Outlne Introducton Stablsng bas - CBS contrbuton Unt level adjustment for bas removal ONS contrbuton Concluson/Further work

Introducton Measurement errors n surveys have always been a problem, but generally gnored Problem becomes more apparent when addtonal modes are used When share of web responses vares over tme (CBS) Impact may be more mportant n atttudnal questons Measurement effects and selecton effects are dffcult to separate complex experments are needed (Vanneuwenhuyze and Revlla, 2013, and Schouten et al., 2013) Focus s on dfferental measurement errors only Results of contrbutons from CBS and ONS

Stablsng bas CBS contrbuton In the context of varyng take-up of modes over tme If non-zero mode-dependent measurement errors are at play, the varatons n mode composton wll lead to a varaton n total bas n the estmates Objectve of method: stablse mode-specfc measurement bas Method developed for sequental multmode desgns Appled to Dutch LFS

Mode composton n wave 1 of Dutch LFS

Stablsng bas by constranng mode mxture Correct for selectvty as usual (standard calbraton) In addton, calbrate response modes to fxed proportons Do not choose extreme calbraton proportons Here: CAWI 44%, CATI 26%, CAPI 30% Calbraton occurs once and apples to all target varables Standard errors ncrease Detals of method n Buelens and Van den Brakel, 2014, SMR

Evaluatng alternatve mxtures Alternatve Mode composton* Explanaton regular varable GREG n current use n producton calbalanced CAWI 44 CATI 26 CAPI 30 mode calbrated to average levels callesscapi CAWI 35 CATI 60 CAPI 5 mode calbrated, neglectng CAPI callesscati CAWI 35 CATI 5 CAPI 60 mode calbrated, neglectng CATI

Alternatve method Combned Predcton estmator Developed by Suzer-Gurtekn, 2013, PhD thess, Unv. of Mchgan Estmate measurement error explctly usng a model: regress y on x and mode Use ftted model to predct responses under alternatve modes Use these to produce mode specfc estmators Combne these estmators to obtan a fnal estmate Here: use combnaton same as calbraton levels Varance estmates through bootstrappng Needs to be done for every target varable

Comparng methods (1)

Comparng methods (2) Alternatve Estmated number unemployed Standard error regular 641,021 6,947 calbraton Balanced combned predcton 636,276 6,966 647,672 6,749

Removng bas va unt level adjustment - ONS contrbuton Sample survey to estmate total/mean of y Two modes are used Sample 1: data collected face to face (FtF) Sample 2: data collected onlne (web) Assume one of the modes s subject to lttle or no measurement error for y; assume t s FtF (reference mode) Assume that some varables are not subject to measurement effects; x=(x1,x2,...,xp) Want to compute condtonal dstrbuton ( FtF Web,,x) f y web y We attempt to extend method of Km (ESRA, 2013) based on statstcal measurement models

Approach Use Bayes theorem ( FtF Web,,x) f y web y ( FtF ) ( Web FtF x ) ( FtF mode =, x) f y g y y P web y Assume gnorable mode selecton ( mode = FtF, x) = ( mode = x) P web y P web Assumng gnorable mode selecton yelds ( FtF mode =, Web, x ) ( FtF x ) ( Web FtF ) f y web y f y g y y

Contnuous varables case Overvew of soluton n Km (2013) Normal dstrbuton case Structural model of truth Measurement error model ( σ ) = x ' +, 0, 2 e y β e e N y FtF = y (FtF same as truth) Web = + Web Web 2 uw y y u, u N(0, σ ) Assumes measurement error leads to no bas but ncreases varance

Overvew of J-K Km s soluton (2) ( Web ) ( Web W 2 ) y y,x N y, α σ e where y ( x ' ) ( 1 ) Web = α W β + α W y Web W ( ) 2 2 2 uw uw e α = σ σ + σ For non-normal dstrbuton case, parametrc fractonal mputaton (Km, 2011) was used Have extended method to case where measurement error leads to bas soluton has same form

Extendng method to categorcal varables Bnary case Consder a bnary varable ( FtF Web mode =,, x) f y web y usng can be expressed ( FtF 1 x ) P y = ( Web FtF 1 0) = y = P y ( Web FtF 0 1) = y = P y Error probabltes

Adjustng for bnary varables (1) ( ) FtF We propose to estmate P y = 1 x usng data from the FtF sample by fttng a logstc regresson model However, we cannot estmate error probabltes unquely from regular survey data We would need a valdaton sample/experment to estmate the error probabltes Costly and estmaton not so straghtforward Estmates may become out of date

Adjustng for bnary varables - consstency Can estmate overall measurement bas from regular survey data n a sequental desgn use Pr ed Uˆ = ω Pr ed Pr ed ( ) 1,x 1,x Web Web FtF s 1,x s the predcted value obtaned by applyng the model ftted usng web data Pr ed Web FtF 1,x s the predcted value obtaned by applyng the model ftted usng FtF data ω Survey weghts need to be calbrated wth respect to x Can wrte an equaton relatng overall measurement bas and the two error probabltes - no unque soluton Error probabltes estmated from experment may not satsfy equaton consstency problem s ω

Adjustng for bnary varables A heurstc soluton for consstency When nconsstency s severe, we need to amend estmates of error probabltes Construct solutons n neghbourhood of the soluton obtaned from experment and satsfyng the equaton Select soluton that leads to smallest ncrease n varance of estmates Bas should be removed at the overall level but estmates of subpopulatons may be slghtly based

Applcaton to onlne Opnons plot (1) 2010 onlne Plot November and December Splt sample desgn FtF Opnons survey as control Demographc questons; some LFS and OPN module questons One-person ntervewed n each selected HH Response for web survey poor n November (8%); better n December (17%) letter amended and snow 54% response rate to FtF survey

Applcaton to onlne Opnons plot (2) Estmates of proporton of adults n employment Calbraton on age, sex and geography FtF sample: 59% Web sample: 66% Bg part of dfference should be due web mode selecton Used age, sex, regon, martal status, level of educaton and tenure as covarates n logstc models Estmate of overall measurement bas found to be 4% Rather large: self-selecton not completely elmnated? Survey weghts calbrated only on age, sex and geography

Unt level adjustments n onlne OPN plot

Concluson /Further work Method for bas stablsaton s promsng but needs further evaluaton Not clear we can adjust for measurement bas usng current survey data only for categorcal data - Need a sutable experment to estmate error probabltes - Need to nvestgate further mpact of usng varance mnmsaton over neghbourng solutons that are consstent wth survey data Ignorable mode selecton s assumed n all methods but avalablty of covarates to control for self-selecton s a problem n many countres Adjustment methods have lmtatons: need to desgn questons that are not mode-senstve

References Buelens, B. and van den Brakel J. (2014) Measurement error calbraton n mxed-mode sample surveys. Socologcal Methods & Research, publshed onlne May 12, do: 10.1177/0049124114532444. Km, J-K (2011) Parametrc fractonal mputaton for mssng data analyss, Bometrca, Vol. 98, Issue 1, pp 119-132. Km, J-K (2013) An mputaton approach for analysng mxed-mode survey, ESRA 2013 conference, July 2013, Lujbana, Slovena. Schouten, B., Jan van den Brakel, J V D, Buelens, B., Laan, J V D, Klausch, T. (2013) Dsentanglng mode-specfc selecton and measurement bas n socal surveys, Socal Scence Research, Volume 42, Issue 6, November 2013, pp 1555-1570 Vanneuwenhuyze, J. T. A. and Revlla, M, (2013) Evaluatng Relatve Mode Effects on Data Qualty n Mxed-Mode Surveys, Survey Research Methods, Vol. 7, N0. 3, pp 157-168.