Industrialising Small Area Estimation at the Australian Bureau of Statistics
|
|
- Cordelia Thornton
- 5 years ago
- Views:
Transcription
1 Industrialising Small Area Estimation at the Australian Bureau of Statistics Peter Radisich Australian Bureau of Statistics Workshop on Methods in Official Statistics - March
2 Outline Background Current method Industrialisation Alternative methods
3 BACKGROUND
4 What is a Small Area? Liberal use of the term Area Small Domain Estimation Areas with small sample size Standard methods not fit for purpose Some areas may have no sample
5 What is Small Area Estimation? The small area estimates are still simple Averages Proportions Totals Just lots of them!
6 Why are estimates for small areas needed? Resource allocation for services Planning and decision making at the local level Policy development and evaluation Research (social, health, labour market) Microeconomic analysis (sustainability of regional economies)
7 Why Use Small Area Estimation? ABS surveys Direct estimates (weighted totals and averages) Usually designed for state and national level Impractical to design for small areas Effort Cost Respondent burden
8 Why Use Small Area Estimation? Need for explicit statistical models Why will a model work? Provide greater access to rich sources of data Sharing data (borrowing strength) More data = better estimates Most of the time. Sharing data not always appropriate
9 How accurate are direct estimates for small areas? Relative Standard Errors for direct estimates of Needs assistance with mobility
10 Small Area Applications People with a disability Labour force status Undercount of Aboriginal and Torres Strait Islander peoples Health conditions Household energy consumption Household wealth Farm water use and agricultural practices Land use, land cover and crop yields
11 CURRENT METHOD As shown by extensive research and analysis, the mean square error would be halved, if you divided it by 2.
12 Current method generalised linear mixed models (GLMM) Usually logistic regression Random intercept at small area level Unit level modelling Ignore design weights Include design variables into the model
13 Small Area Methods Poisson Logistic Multinomial Linear Log Linear Models Synthetic Random effects Estimation WinBUGS MPQL REML Quality Diagnostics
14 Small Area Methods Poisson Logistic Multinomial Linear Log Linear Models Random effects Estimation WinBUGS INLA? PROC MPQL GLIMMIX REML Quality Diagnostics
15 Current SAE Process 1. Obtain and prepare data. 2. Build the regression model 3. Calculate predictions and measures of accuracy 4. Calibrate to published estimates 5. Quality assurance of predictions
16 Current SAE Process 1. Obtain and prepare data. 2. Build the regression model 3. Calculate predictions and measures of accuracy 4. Calibrate to published estimates 5. Quality assurance of predictions
17 Data sources Survey has the variable of interest Census Administrative Centrelink, Tax, Building Approvals, Weather, etc. Estimated resident population Anything else you can get access to.
18 Preparing the data End goal: two final data sets Sample data Population data
19 Preparing the data Sample Data Population Data
20 Preparing data Look for common data items Variable definition in Survey vs Census Look for important variables Contextual variables Easy to drown in data definitions Family Composition vs Family Type Data item lists for surveys are massive
21 Preparing data Understanding data takes time We have information in different people/sections Understand data, but not models Understand models, but not the data The search for explanatory variables When to stop? Interplay with knowing the data & modelling
22 Current SAE Process 1. Obtain and prepare data. 2. Build the regression model 3. Calculate predictions and measures of accuracy 4. Calibrate to published estimates 5. Quality assurance of predictions
23 Build regression model GLMM type model y: variable of interest y s (y r ): sample (population) data set X: explanatory variables ( fixed ) X s (X r ): sample (population) data set Z: small area variable ( random ) Z s (Z r ): sample (population) data set
24 Build regression model Sample Data Population Data y s X s Z s y r X r Z r
25 Build regression model Sample Data Population Data y s X s Z s y r X r Z r
26 Build regression model We know everything except y r Fit a model using the sample (y s, X s, Z s ) Model selection ( pruning of columns in X s ) Parameter estimates Standard errors Predict using the population (X r, Z r )
27 Build regression model Rough description of model E y s u = h X s β + Z s u h η = η exp η exp η linear log linear logistic
28 Build regression model Random effects u~gaussian 0, φi Parameters of the GLMM model θ = β, u, φ
29 Build regression model Parameters of the model Contain everything about y s that is relevant to predicting y r If we knew θ then we would not need the sample data.
30 Build regression model Sample Data Population Data Parameter estimates θ = β, u, φ
31 Build regression model Model selection Weak theory We expect many variables to be unimportant Raftery (1995) Bayesian Model Selection in Social Research Laundry list of possible explanatory variables
32 P-value Build regression model? Use BIC to select explanatory variables Implied p-value = Pr(χ 2 1>log(n)) Much smaller than usual Sample size
33 Build regression model? Use BIC to select explanatory variables Implied p-value = Pr(χ 2 1>log(n)) Much smaller than usual 0.05
34 CV(β k ) Build regression model? Use BIC to select explanatory variables If CV(β k ) > 30% then drop X k from the model 35% 33% 31% 29% 27% 25% Sample size
35 Build regression model.but No guarantee that model predictions will be close to survey estimates at broad level Step 4: calibration Use BIC, but constrain possible models close to direct estimates at broad level Not significant variables may be kept Significant variables may be dropped
36 Current SAE Process 1. Obtain and prepare data. 2. Build the regression model 3. Calculate predictions and measures of accuracy 4. Calibrate to published estimates 5. Quality assurance of predictions
37 Calculate predictions Calculate predictions by plugging in estimates (EBLUP) E y r u = h X r β + Z r u Predictions only depend on sample through parameter estimates y r = h X r β + Z r u
38 Calculate predictions and measures of accuracy Sample Data Population Data Parameter estimates Small Area Estimates
39 Calculate measures of accuracy Primary measure of accuracy: Mean Square Error (MSE) Calculated by magic MSE y r = G 1 + G 2 + 2G 3 + G 4 MSE estimator only depends on sample data through parameter estimates
40 Current SAE Process 1. Obtain and prepare data. 2. Build the regression model 3. Calculate predictions and MSEs 4. Calibrate to published estimates 5. Quality assurance of predictions
41 Calibrate to published estimates Small Area Estimates created after key headline figures released Eg State level estimates of disability counts Y d = Y NSW d "NSW" Estimates from GLMM Direct estimate using survey weights
42 Calibrate to published estimates Small Area Estimates created after key headline figures released Eg State level estimates of disability counts Y d = Y NSW d "NSW" Estimates from GLMM Direct estimate using survey weights
43 Calibrate to published estimates Also, the modelling is done variable by variable. For multicategory variables, we model each category separately as a binary variable Estimates of number of people with any disability Y d,1 Y d,2 Estimates of number of people with a mild disability
44 Calibrate to published estimates Also, the modelling is done variable by variable. For multicategory variables, we model each category separately as a binary variable Estimates of number of people with any disability Y d,1 Y d,2 Estimates of number of people with a mild disability
45 Calibrate to published estimates Competing goals Want to publish our model based predictions Want to be coherent with other releases Want to be coherent with small area estimates for other variables Calibration! Implemented through GREGWT macro Some tricks used for large number of constraints
46 Current SAE Process 1. Obtain and prepare data. 2. Build the regression model 3. Calculate predictions and MSEs 4. Calibrate to published estimates 5. Quality assurance of predictions
47 Quality Diagnostics for Small Area Estimates Relative Root Mean Square Errors (RRMSEs) Bias plots Check model assumptions, Goodness of Fit Consistency with direct estimates Spatial mapping
48 Bias plot 261 small areas with sample size of at least 30 people
49
50
51 Distribution of SAEs Small area estimates of proportions of males with any disability.
52 Difficulties with the current method Models required for every variable Survey s have a large number of potential variables of interest Example: disability consultancies Over 50 breakdowns of disability Required fitting 50 GLMMs
53 Difficulties with the current method Big Data = Big Data manipulation Creating the X and Z matrices is expensive Big data sets often leads to explosion in number of potential explanatory variables. Difficult to transfer knowledge and experience Knowledge about data and computer systems Knowledge about modelling and analysis
54 INDUSTRIALISATION
55 Industrialisation of SAEs Knowledge Management ABS 2017 Toolset SAE Methods Data Preparation Documentation Integrated SAE System Metadata Retrieval Quality Assurance Diagnostics
56 ALTERNATIVE METHODS
57 Alternative methods Standard statistical output How efficient are direct estimates? Survey data comes with weights. These weights do not depend on the variable of interest Fit one model, produce estimates for all variables
58 Alternative methods Fay Herriot type models Bayesian Bootstrap INLA Weighting methods Reweighting Model Based Direct Estimation BARE
59 Alternative methods Fay Herriot type models Extensive literature Similar to unit level modelling Change small area means new FH model Less efficient
60 Alternative methods Bayesian Bootstrap Polya s urn model Robust standard errors (eg model selection) Use of Monte Carlo less efficient Bayesian analogue of model-assisted method
61 Alternative methods INLA Bayesian approximation for GLMMs Similar to computations used in current method More accurate computations
62 Alternative methods Weight based methods Y d = i s w di y i Weighted sum over whole sample, not just in those in the area/domain Weights for each unit are different for different areas/domains
63 Alternative methods Y d = w di y i Reweighting i s One set of weights for each small area hard calibration on area specific benchmarks Consistency with survey weights ( weight sharing ) w 1i + w 2i + + w Di = W i
64 Alternative methods 1 Y d = w i w i y i i s d i s d Model Based Direct Estimation One set of weights (w di =w i ) Based on Linear Mixed Model Weighted average over sample in the area/domain Similar to direct estimation hard calibration on fixed effects soft calibration on random effects
65 Alternative methods MSE estimation Very difficult for reweighting/bare data sharing = bias up, variance down Bias hard to quantify Expect MSE to be smaller Difficult to quantify how much smaller
66 Summary Background Current method Industrialisation
67 More information? Pfeffermann, D. (2013) New Important Developments in Small Area Estimation, Statistical Science, 28, 1, A Guide to Small Area Estimation is available on -> Statistical References. Small Area Estimation by J. N. K. Rao Sean Buttsworth sean.buttsworth@abs.gov.au (02) Peter Radisich peter.radisich@abs.gov.au (02)
Small area estimation by model calibration and "hybrid" calibration. Risto Lehtonen, University of Helsinki Ari Veijanen, Statistics Finland
Small area estimation by model calibration and "hybrid" calibration Risto Lehtonen, University of Helsinki Ari Veijanen, Statistics Finland NTTS Conference, Brussels, 10-12 March 2015 Lehtonen R. and Veijanen
More informationStatistical Matching using Fractional Imputation
Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:
More informationDual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys
Dual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys Steven Pedlow 1, Kanru Xia 1, Michael Davern 1 1 NORC/University of Chicago, 55 E. Monroe Suite 2000, Chicago, IL 60603
More informationSmall area estimation II
Small area estimation II Monica Pratesi and Caterina Giusti Department of Economics and Management, University of Pisa 1 st EMOS Spring School Trier, Pisa, Manchester, Luxembourg, 23-27 March 2015 Structure
More informationStat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors
Week 10: Autocorrelated errors This week, I have done one possible analysis and provided lots of output for you to consider. Case study: predicting body fat Body fat is an important health measure, but
More informationMissing Data and Imputation
Missing Data and Imputation NINA ORWITZ OCTOBER 30 TH, 2017 Outline Types of missing data Simple methods for dealing with missing data Single and multiple imputation R example Missing data is a complex
More informationIntroduction to Mplus
Introduction to Mplus May 12, 2010 SPONSORED BY: Research Data Centre Population and Life Course Studies PLCS Interdisciplinary Development Initiative Piotr Wilk piotr.wilk@schulich.uwo.ca OVERVIEW Mplus
More informationLecture 13: Model selection and regularization
Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always
More informationA noninformative Bayesian approach to small area estimation
A noninformative Bayesian approach to small area estimation Glen Meeden School of Statistics University of Minnesota Minneapolis, MN 55455 glen@stat.umn.edu September 2001 Revised May 2002 Research supported
More informationGAMs semi-parametric GLMs. Simon Wood Mathematical Sciences, University of Bath, U.K.
GAMs semi-parametric GLMs Simon Wood Mathematical Sciences, University of Bath, U.K. Generalized linear models, GLM 1. A GLM models a univariate response, y i as g{e(y i )} = X i β where y i Exponential
More informationGlobal modelling of air pollution using multiple data sources
Global modelling of air pollution using multiple data sources Matthew Thomas SAMBa, University of Bath Email: M.L.Thomas@bath.ac.uk November 11, 015 1/ 3 OUTLINE Motivation Data Sources Existing Approaches
More informationPredictive Analytics: Demystifying Current and Emerging Methodologies. Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA
Predictive Analytics: Demystifying Current and Emerging Methodologies Tom Kolde, FCAS, MAAA Linda Brobeck, FCAS, MAAA May 18, 2017 About the Presenters Tom Kolde, FCAS, MAAA Consulting Actuary Chicago,
More informationEstimation of Item Response Models
Estimation of Item Response Models Lecture #5 ICPSR Item Response Theory Workshop Lecture #5: 1of 39 The Big Picture of Estimation ESTIMATOR = Maximum Likelihood; Mplus Any questions? answers Lecture #5:
More informationTime Series Analysis by State Space Methods
Time Series Analysis by State Space Methods Second Edition J. Durbin London School of Economics and Political Science and University College London S. J. Koopman Vrije Universiteit Amsterdam OXFORD UNIVERSITY
More informationSimulating from the Polya posterior by Glen Meeden, March 06
1 Introduction Simulating from the Polya posterior by Glen Meeden, glen@stat.umn.edu March 06 The Polya posterior is an objective Bayesian approach to finite population sampling. In its simplest form it
More informationTopics in Machine Learning-EE 5359 Model Assessment and Selection
Topics in Machine Learning-EE 5359 Model Assessment and Selection Ioannis D. Schizas Electrical Engineering Department University of Texas at Arlington 1 Training and Generalization Training stage: Utilizing
More informationRESAMPLING METHODS. Chapter 05
1 RESAMPLING METHODS Chapter 05 2 Outline Cross Validation The Validation Set Approach Leave-One-Out Cross Validation K-fold Cross Validation Bias-Variance Trade-off for k-fold Cross Validation Cross Validation
More informationInference for Generalized Linear Mixed Models
Inference for Generalized Linear Mixed Models Christina Knudson, Ph.D. University of St. Thomas October 18, 2018 Reviewing the Linear Model The usual linear model assumptions: responses normally distributed
More informationStatistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland
Statistical Analysis Using Combined Data Sources: Discussion 2011 JPSM Distinguished Lecture University of Maryland 1 1 University of Michigan School of Public Health April 2011 Complete (Ideal) vs. Observed
More information7. Collinearity and Model Selection
Sociology 740 John Fox Lecture Notes 7. Collinearity and Model Selection Copyright 2014 by John Fox Collinearity and Model Selection 1 1. Introduction I When there is a perfect linear relationship among
More informationEXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY
EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY GRADUATE DIPLOMA, 2015 MODULE 4 : Modelling experimental data Time allowed: Three hours Candidates should answer FIVE questions. All questions carry equal
More informationUsing Monetary incentives in face-to-face surveys:
Using Monetary incentives in face-to-face surveys: Are prepaid incentives more effective than promised incentives? Michael Blohm & Achim Koch Q2016 - European Conference on Quality in Official Statistics
More informationEnsemble Learning: An Introduction. Adapted from Slides by Tan, Steinbach, Kumar
Ensemble Learning: An Introduction Adapted from Slides by Tan, Steinbach, Kumar 1 General Idea D Original Training data Step 1: Create Multiple Data Sets... D 1 D 2 D t-1 D t Step 2: Build Multiple Classifiers
More informationGlobal modelling of air pollution using multiple data sources
Global modelling of air pollution using multiple data sources Matthew Thomas M.L.Thomas@bath.ac.uk Supervised by Dr. Gavin Shaddick In collaboration with IHME and WHO June 14, 2016 1/ 1 MOTIVATION Air
More informationUsing Machine Learning to Optimize Storage Systems
Using Machine Learning to Optimize Storage Systems Dr. Kiran Gunnam 1 Outline 1. Overview 2. Building Flash Models using Logistic Regression. 3. Storage Object classification 4. Storage Allocation recommendation
More information2017 ITRON EFG Meeting. Abdul Razack. Specialist, Load Forecasting NV Energy
2017 ITRON EFG Meeting Abdul Razack Specialist, Load Forecasting NV Energy Topics 1. Concepts 2. Model (Variable) Selection Methods 3. Cross- Validation 4. Cross-Validation: Time Series 5. Example 1 6.
More informationCHAPTER 1 INTRODUCTION
Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,
More informationHierarchical Generalized Linear Models
Generalized Multilevel Linear Models Introduction to Multilevel Models Workshop University of Georgia: Institute for Interdisciplinary Research in Education and Human Development 07 Generalized Multilevel
More informationSimulation studies. Patrick Breheny. September 8. Monte Carlo simulation Example: Ridge vs. Lasso vs. Subset
Simulation studies Patrick Breheny September 8 Patrick Breheny BST 764: Applied Statistical Modeling 1/17 Introduction In statistics, we are often interested in properties of various estimation and model
More informationBuilding Better Parametric Cost Models
Building Better Parametric Cost Models Based on the PMI PMBOK Guide Fourth Edition 37 IPDI has been reviewed and approved as a provider of project management training by the Project Management Institute
More informationSTATISTICS (STAT) Statistics (STAT) 1
Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).
More informationHILDA PROJECT TECHNICAL PAPER SERIES No. 2/08, February 2008
HILDA PROJECT TECHNICAL PAPER SERIES No. 2/08, February 2008 HILDA Standard Errors: A Users Guide Clinton Hayes The HILDA Project was initiated, and is funded, by the Australian Government Department of
More informationMonte Carlo Simulation. Ben Kite KU CRMDA 2015 Summer Methodology Institute
Monte Carlo Simulation Ben Kite KU CRMDA 2015 Summer Methodology Institute Created by Terrence D. Jorgensen, 2014 What Is a Monte Carlo Simulation? Anything that involves generating random data in a parameter
More informationDATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R
DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R Lee, Rönnegård & Noh LRN@du.se Lee, Rönnegård & Noh HGLM book 1 / 24 Overview 1 Background to the book 2 Crack growth example 3 Contents
More informationVariational Methods for Discrete-Data Latent Gaussian Models
Variational Methods for Discrete-Data Latent Gaussian Models University of British Columbia Vancouver, Canada March 6, 2012 The Big Picture Joint density models for data with mixed data types Bayesian
More informationVignette of the JoSAE package
Vignette of the JoSAE package Johannes Breidenbach 6 October 2011: JoSAE 0.2 1 Introduction The aim in the analysis of sample surveys is frequently to derive estimates of subpopulation characteristics.
More informationCPSC 340: Machine Learning and Data Mining. Logistic Regression Fall 2016
CPSC 340: Machine Learning and Data Mining Logistic Regression Fall 2016 Admin Assignment 1: Marks visible on UBC Connect. Assignment 2: Solution posted after class. Assignment 3: Due Wednesday (at any
More informationBasic facts about Dudley
Basic facts about Dudley This report provides a summary of the latest available information on the demographic and socioeconomic make-up of the 1 Big Local (DRAFT)s in Dudley. It looks at the population
More informationSection 4 General Factorial Tutorials
Section 4 General Factorial Tutorials General Factorial Part One: Categorical Introduction Design-Ease software version 6 offers a General Factorial option on the Factorial tab. If you completed the One
More informationTHIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010
THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE
More informationMachine Learning / Jan 27, 2010
Revisiting Logistic Regression & Naïve Bayes Aarti Singh Machine Learning 10-701/15-781 Jan 27, 2010 Generative and Discriminative Classifiers Training classifiers involves learning a mapping f: X -> Y,
More informationMulticollinearity and Validation CIVL 7012/8012
Multicollinearity and Validation CIVL 7012/8012 2 In Today s Class Recap Multicollinearity Model Validation MULTICOLLINEARITY 1. Perfect Multicollinearity 2. Consequences of Perfect Multicollinearity 3.
More informationStatistics (STAT) Statistics (STAT) 1. Prerequisites: grade in C- or higher in STAT 1200 or STAT 1300 or STAT 1400
Statistics (STAT) 1 Statistics (STAT) STAT 1200: Introductory Statistical Reasoning Statistical concepts for critically evaluation quantitative information. Descriptive statistics, probability, estimation,
More informationVariance Estimation in Presence of Imputation: an Application to an Istat Survey Data
Variance Estimation in Presence of Imputation: an Application to an Istat Survey Data Marco Di Zio, Stefano Falorsi, Ugo Guarnera, Orietta Luzi, Paolo Righi 1 Introduction Imputation is the commonly used
More information22s:152 Applied Linear Regression
22s:152 Applied Linear Regression Chapter 22: Model Selection In model selection, the idea is to find the smallest set of variables which provides an adequate description of the data. We will consider
More informationGraphical Models, Bayesian Method, Sampling, and Variational Inference
Graphical Models, Bayesian Method, Sampling, and Variational Inference With Application in Function MRI Analysis and Other Imaging Problems Wei Liu Scientific Computing and Imaging Institute University
More informationLecture 25: Review I
Lecture 25: Review I Reading: Up to chapter 5 in ISLR. STATS 202: Data mining and analysis Jonathan Taylor 1 / 18 Unsupervised learning In unsupervised learning, all the variables are on equal standing,
More informationINLA: an introduction
INLA: an introduction Håvard Rue 1 Norwegian University of Science and Technology Trondheim, Norway May 2009 1 Joint work with S.Martino (Trondheim) and N.Chopin (Paris) Latent Gaussian models Background
More informationApplying multiple imputation on waterbird census data Comparing two imputation methods
Applying multiple imputation on waterbird census data Comparing two imputation methods ISEC, Montpellier, 1 july 2014 Thierry Onkelinx, Koen Devos & Paul Quataert Thierry.Onkelinx@inbo.be Contents 1 Introduction
More informationFrequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS
ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion
More informationLecture 27, April 24, Reading: See class website. Nonparametric regression and kernel smoothing. Structured sparse additive models (GroupSpAM)
School of Computer Science Probabilistic Graphical Models Structured Sparse Additive Models Junming Yin and Eric Xing Lecture 7, April 4, 013 Reading: See class website 1 Outline Nonparametric regression
More informationSupervised vs unsupervised clustering
Classification Supervised vs unsupervised clustering Cluster analysis: Classes are not known a- priori. Classification: Classes are defined a-priori Sometimes called supervised clustering Extract useful
More informationDATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R
DATA ANALYSIS USING HIERARCHICAL GENERALIZED LINEAR MODELS WITH R Lee, Rönnegård & Noh LRN@du.se Lee, Rönnegård & Noh HGLM book 1 / 25 Overview 1 Background to the book 2 A motivating example from my own
More informationDual-Frame Weights (Landline and Cell) for the 2009 Minnesota Health Access Survey
Dual-Frame Weights (Landline and Cell) for the 2009 Minnesota Health Access Survey Kanru Xia 1, Steven Pedlow 1, Michael Davern 1 1 NORC/University of Chicago, 55 E. Monroe Suite 2000, Chicago, IL 60603
More informationPredicting poverty from satellite imagery
Predicting poverty from satellite imagery Neal Jean, Michael Xie, Stefano Ermon Department of Computer Science Stanford University Matt Davis, Marshall Burke, David Lobell Department of Earth Systems Science
More informationOutline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model
Topic 16 - Other Remedies Ridge Regression Robust Regression Regression Trees Outline - Fall 2013 Piecewise Linear Model Bootstrapping Topic 16 2 Ridge Regression Modification of least squares that addresses
More informationDetecting and Circumventing Collinearity or Ill-Conditioning Problems
Chapter 8 Detecting and Circumventing Collinearity or Ill-Conditioning Problems Section 8.1 Introduction Multicollinearity/Collinearity/Ill-Conditioning The terms multicollinearity, collinearity, and ill-conditioning
More informationLinear Model Selection and Regularization. especially usefull in high dimensions p>>100.
Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records
More informationModelling and Quantitative Methods in Fisheries
SUB Hamburg A/553843 Modelling and Quantitative Methods in Fisheries Second Edition Malcolm Haddon ( r oc) CRC Press \ y* J Taylor & Francis Croup Boca Raton London New York CRC Press is an imprint of
More informationGeneralized Additive Models
:p Texts in Statistical Science Generalized Additive Models An Introduction with R Simon N. Wood Contents Preface XV 1 Linear Models 1 1.1 A simple linear model 2 Simple least squares estimation 3 1.1.1
More informationCHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA
Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent
More informationCross-validation. Cross-validation is a resampling method.
Cross-validation Cross-validation is a resampling method. It refits a model of interest to samples formed from the training set, in order to obtain additional information about the fitted model. For example,
More informationFeature Subset Selection for Logistic Regression via Mixed Integer Optimization
Feature Subset Selection for Logistic Regression via Mixed Integer Optimization Yuichi TAKANO (Senshu University, Japan) Toshiki SATO (University of Tsukuba) Ryuhei MIYASHIRO (Tokyo University of Agriculture
More informationA Bayesian analysis of survey design parameters for nonresponse, costs and survey outcome variable models
A Bayesian analysis of survey design parameters for nonresponse, costs and survey outcome variable models Eva de Jong, Nino Mushkudiani and Barry Schouten ASD workshop, November 6-8, 2017 Outline Bayesian
More informationRSM Split-Plot Designs & Diagnostics Solve Real-World Problems
RSM Split-Plot Designs & Diagnostics Solve Real-World Problems Shari Kraber Pat Whitcomb Martin Bezener Stat-Ease, Inc. Stat-Ease, Inc. Stat-Ease, Inc. 221 E. Hennepin Ave. 221 E. Hennepin Ave. 221 E.
More informationIvy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V)
Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V) Based on Industry Cases, Live Exercises, & Industry Executed Projects Module (I) Analytics Essentials 81 hrs 1. Statistics
More informationJMASM 46: Algorithm for Comparison of Robust Regression Methods In Multiple Linear Regression By Weighting Least Square Regression (SAS)
Journal of Modern Applied Statistical Methods Volume 16 Issue 2 Article 27 December 2017 JMASM 46: Algorithm for Comparison of Robust Regression Methods In Multiple Linear Regression By Weighting Least
More informationin this course) ˆ Y =time to event, follow-up curtailed: covered under ˆ Missing at random (MAR) a
Chapter 3 Missing Data 3.1 Types of Missing Data ˆ Missing completely at random (MCAR) ˆ Missing at random (MAR) a ˆ Informative missing (non-ignorable non-response) See 1, 38, 59 for an introduction to
More informationDigital Enablement bridging the digital divide
Digital Enablement bridging the digital divide Ahmar Waryas ahmar.waryas@huawei.com China Internet plus policy will transform industries New Economic Growth Engine: From Made in China to Create in China
More informationOutline. CS 6776 Evolutionary Computation. Numerical Optimization. Fitness Function. ,x 2. ) = x 2 1. , x , 5.0 x 1.
Outline CS 6776 Evolutionary Computation January 21, 2014 Problem modeling includes representation design and Fitness Function definition. Fitness function: Unconstrained optimization/modeling Constrained
More informationWeighting and estimation for the EU-SILC rotational design
Weighting and estimation for the EUSILC rotational design JeanMarc Museux 1 (Provisional version) 1. THE EUSILC INSTRUMENT 1.1. Introduction In order to meet both the crosssectional and longitudinal requirements,
More informationCOPYRIGHTED MATERIAL CONTENTS
PREFACE ACKNOWLEDGMENTS LIST OF TABLES xi xv xvii 1 INTRODUCTION 1 1.1 Historical Background 1 1.2 Definition and Relationship to the Delta Method and Other Resampling Methods 3 1.2.1 Jackknife 6 1.2.2
More informationData Mining Lecture 8: Decision Trees
Data Mining Lecture 8: Decision Trees Jo Houghton ECS Southampton March 8, 2019 1 / 30 Decision Trees - Introduction A decision tree is like a flow chart. E. g. I need to buy a new car Can I afford it?
More informationMotivation. Monte Carlo Path Tracing. Monte Carlo Path Tracing. Monte Carlo Path Tracing. Monte Carlo Path Tracing
Advanced Computer Graphics (Spring 2013) CS 283, Lecture 11: Monte Carlo Path Tracing Ravi Ramamoorthi http://inst.eecs.berkeley.edu/~cs283/sp13 Motivation General solution to rendering and global illumination
More informationExploring Econometric Model Selection Using Sensitivity Analysis
Exploring Econometric Model Selection Using Sensitivity Analysis William Becker Paolo Paruolo Andrea Saltelli Nice, 2 nd July 2013 Outline What is the problem we are addressing? Past approaches Hoover
More informationTour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers
Tour-Based Mode Choice Modeling: Using An Ensemble of (Un-) Conditional Data-Mining Classifiers James P. Biagioni Piotr M. Szczurek Peter C. Nelson, Ph.D. Abolfazl Mohammadian, Ph.D. Agenda Background
More informationEnsemble Methods, Decision Trees
CS 1675: Intro to Machine Learning Ensemble Methods, Decision Trees Prof. Adriana Kovashka University of Pittsburgh November 13, 2018 Plan for This Lecture Ensemble methods: introduction Boosting Algorithm
More informationJMP Book Descriptions
JMP Book Descriptions The collection of JMP documentation is available in the JMP Help > Books menu. This document describes each title to help you decide which book to explore. Each book title is linked
More informationPerformance Estimation and Regularization. Kasthuri Kannan, PhD. Machine Learning, Spring 2018
Performance Estimation and Regularization Kasthuri Kannan, PhD. Machine Learning, Spring 2018 Bias- Variance Tradeoff Fundamental to machine learning approaches Bias- Variance Tradeoff Error due to Bias:
More informationLecture on Modeling Tools for Clustering & Regression
Lecture on Modeling Tools for Clustering & Regression CS 590.21 Analysis and Modeling of Brain Networks Department of Computer Science University of Crete Data Clustering Overview Organizing data into
More informationIPUMS Training and Development: Requesting Data
IPUMS Training and Development: Requesting Data IPUMS PMA Exercise 2 OBJECTIVE: Gain an understanding of how IPUMS PMA service delivery point datasets are structured and how it can be leveraged to explore
More informationMethodological challenges of Big Data for official statistics
Methodological challenges of Big Data for official statistics Piet Daas Statistics Netherlands THE CONTRACTOR IS ACTING UNDER A FRAMEWORK CONTRACT CONCLUDED WITH THE COMMISSION Content Big Data: properties
More informationUVA CS 6316/4501 Fall 2016 Machine Learning. Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff. Dr. Yanjun Qi. University of Virginia
UVA CS 6316/4501 Fall 2016 Machine Learning Lecture 15: K-nearest-neighbor Classifier / Bias-Variance Tradeoff Dr. Yanjun Qi University of Virginia Department of Computer Science 11/9/16 1 Rough Plan HW5
More informationNonparametric Error Estimation Methods for Evaluating and Validating Artificial Neural Network Prediction Models
Nonparametric Error Estimation Methods for Evaluating and Validating Artificial Neural Network Prediction Models Janet M. Twomey and Alice E. Smith Department of Industrial Engineering University of Pittsburgh
More informationAsreml-R: an R package for mixed models using residual maximum likelihood
Asreml-R: an R package for mixed models using residual maximum likelihood David Butler 1 Brian Cullis 2 Arthur Gilmour 3 1 Queensland Department of Primary Industries Toowoomba 2 NSW Department of Primary
More informationTopic:- DU_J18_MA_STATS_Topic01
DU MA MSc Statistics Topic:- DU_J18_MA_STATS_Topic01 1) In analysis of variance problem involving 3 treatments with 10 observations each, SSE= 399.6. Then the MSE is equal to: [Question ID = 2313] 1. 14.8
More informationSTAT 705 Introduction to generalized additive models
STAT 705 Introduction to generalized additive models Timothy Hanson Department of Statistics, University of South Carolina Stat 705: Data Analysis II 1 / 22 Generalized additive models Consider a linear
More informationMODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES
UNIVERSITY OF GLASGOW MODEL SELECTION AND MODEL AVERAGING IN THE PRESENCE OF MISSING VALUES by KHUNESWARI GOPAL PILLAY A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in
More informationBlending of Probability and Convenience Samples:
Blending of Probability and Convenience Samples: Applications to a Survey of Military Caregivers Michael Robbins RAND Corporation Collaborators: Bonnie Ghosh-Dastidar, Rajeev Ramchand September 25, 2017
More informationRegression. Dr. G. Bharadwaja Kumar VIT Chennai
Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called
More informationAnalysis of Complex Survey Data with SAS
ABSTRACT Analysis of Complex Survey Data with SAS Christine R. Wells, Ph.D., UCLA, Los Angeles, CA The differences between data collected via a complex sampling design and data collected via other methods
More informationApplying Supervised Learning
Applying Supervised Learning When to Consider Supervised Learning A supervised learning algorithm takes a known set of input data (the training set) and known responses to the data (output), and trains
More informationModelling Proportions and Count Data
Modelling Proportions and Count Data Rick White May 4, 2016 Outline Analysis of Count Data Binary Data Analysis Categorical Data Analysis Generalized Linear Models Questions Types of Data Continuous data:
More informationCS281 Section 9: Graph Models and Practical MCMC
CS281 Section 9: Graph Models and Practical MCMC Scott Linderman November 11, 213 Now that we have a few MCMC inference algorithms in our toolbox, let s try them out on some random graph models. Graphs
More informationLinear Models. Lecture Outline: Numeric Prediction: Linear Regression. Linear Classification. The Perceptron. Support Vector Machines
Linear Models Lecture Outline: Numeric Prediction: Linear Regression Linear Classification The Perceptron Support Vector Machines Reading: Chapter 4.6 Witten and Frank, 2nd ed. Chapter 4 of Mitchell Solving
More informationBipartite Edge Prediction via Transductive Learning over Product Graphs
Bipartite Edge Prediction via Transductive Learning over Product Graphs Hanxiao Liu, Yiming Yang School of Computer Science, Carnegie Mellon University July 8, 2015 ICML 2015 Bipartite Edge Prediction
More informationStudy Guide. Module 1. Key Terms
Study Guide Module 1 Key Terms general linear model dummy variable multiple regression model ANOVA model ANCOVA model confounding variable squared multiple correlation adjusted squared multiple correlation
More informationMonte Carlo for Spatial Models
Monte Carlo for Spatial Models Murali Haran Department of Statistics Penn State University Penn State Computational Science Lectures April 2007 Spatial Models Lots of scientific questions involve analyzing
More informationLecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression
Lecture Simple Regression, An Overview, and Simple Linear Regression Learning Objectives In this set of lectures we will develop a framework for simple linear, logistic, and Cox Proportional Hazards Regression
More informationMachine Learning: An Applied Econometric Approach Online Appendix
Machine Learning: An Applied Econometric Approach Online Appendix Sendhil Mullainathan mullain@fas.harvard.edu Jann Spiess jspiess@fas.harvard.edu April 2017 A How We Predict In this section, we detail
More information