Research with Large Databases
|
|
- Gervais Foster
- 6 years ago
- Views:
Transcription
1 Research with Large Databases Key Statistical and Design Issues and Software for Analyzing Large Databases John Ayanian, MD, MPP Ellen P. McCarthy, PhD, MPH Society of General Internal Medicine Chicago, 2004
2 Today s Objectives Practical issues in data management and potential pitfalls Overview of Sampling Designs Key analytic issues to consider when analyzing data with complex sampling designs Overview of command language for SUDAAN Similarities and differences between SAS and SUDAAN What to look for in survey documentation before beginning analyses
3 Practical Issues in Data Management of Large Databases Convert ASCII (raw) data to SAS database Construct smallest analytic file possible Merge or concatenate files (if necessary) Identify study sample Select variables of interest What to look for in documentation of complex surveys Common sources of Error using SUDAAN
4 Commonly used Statistical Software such as SAS Useful to prepare data for analysis Merge and combine information from multiple data sources Subset data to identify the sample that you want to study Create and recode variables Limited number of procedures to properly analyze data from complex survey designs in version 8+
5 SUDAAN Software for the Statistical Analysis of Correlated Data Analyze data from complex sample designs, including repeated measures, multistage samples, and clustercorrelated data Computes appropriate standard errors that account for the sample design Flexible - includes many design options (SRS, WR, WOR) SAS-Callable SUDAAN - execute SUDAAN procedures within SAS
6 Identify Appropriate Statistical Software for Analyses SAS SAS-callable SUDAAN SEER Public Use NHIS SEER-Medicare NAMCS NIS BRFSS
7 Construct Smallest Analytic File Possible SAS Analytic file should include ONLY study sample and variables of interest SUDAAN Analytic file requires ALL observations to perform analyses, but limit the number of variables Limit variables using DROP or KEEP option in your DATA statement DROP age - - race diag1-diag10; KEEP age - - race diag1-diag10;
8 Merging Data Files Example using NHIS: You want to take a few variables from the person file (i.e. health insurance status) and merge it onto the Adult core Person File N=100,000 Adult Sample Core N=32,000
9 Steps to merging in SAS Merging Data Files Continued 1) Each file must be sorted by the merge variable(s) 2) Select only the variables you want to add into your analytic file 3) Merge files in new DATA step 4) Whenever possible construct permanent dataset with added variables Note: SUDAAN users must re-sort data by variables listed in NEST statement
10 Merging Data Files Example SAS Code Goal: Link insurance from person file to sample adult file DATA person; SET IN.PERSON00 (KEEP=IDNUM INSURE); PROC SORT; BY IDNUM; RUN; DATA adult; SET IN.ADULT00; PROC SORT; BY IDNUM; RUN; DATA OUT.newADULT; MERGE person adult (in=case); BY IDNUM; IF CASE; retains only subjects in the adult file! PROC SORT; BY STRATUM PSU; RUN;
11 Concatenate Data Files Example using NAMCS: You want to combine data from different survey years (e.g., 1998 to 2000) to a single dataset (i.e., 75,000 visits) Year 1998 n=25,000 Year 1999 N=25,000 Year 2000 N=25,000
12 Concatenate Data Files Example SAS Code Goal: Increase sample size for specific subgroup estimates by combining multiple years of data DATA OUT.namc9800; SET IN.NAMC98 IN.NAMC99 IN.NAMC00; PROC SORT; BY stratm psum subfile prostrat year provider dept su clinic: RUN; Which weight do I use? Often you will want to use WEIGHT/3 to get an average annual weight
13 Working with ICD-9-CM Diagnosis and Procedure Codes Identify all codes of interest Codes available at Example: Identify cases of AMI (diagnosis code 410.xx) or CHF (diagnosis code 428.x) in hospital discharge abstracts In SAS, use ARRAYS and DO LOOPS to search all of the diagnosis fields
14 Sample SAS Code: Working with ICD-9-CM Diagnosis Codes Example: Identifying cases of AMI or CHD across 10 potential diagnosis fields ARRAY diag(10) DIAG1-DIAG10; ami=2; chf=2; DO i=1 to 10; IF SUBSTR(diag(i),1,3) = 410 then AMI=1; ELSE SUBSTR(diag(i),1,3) = 428 then CHF=1; END; Similar code can be used to identifying procedures of interest across multiple procedure fields
15 Identify Study Sample SAS POTENTIAL PITFALL SAS and SUDAAN differ in the way you subset data Example: Identify women age years SAS smallest analytic file possible Apply inclusion (and exclusion) criteria to your data and DELETE the subjects that do not meet criteria IF SEX=2 AND (AGE GE 50 and AGE LE 69);
16 Identify Study Sample SUDAAN SUDAAN requires all observations to be present to compute variance estimates Apply inclusion (and exclusion) criteria to your data and construct an INDICATOR variable to identify subjects who meet inclusion criteria IF SEX=2 AND (AGE GE 50 and AGE LE 69) THEN SAMPLE=1; ELSE SAMPLE=2; SUBPOPN ~ maintains the integrity of the study design while allowing analysis on only a subgroup of observations (subjects/respondents) SUBPOPN SAMPLE=1;
17 Overview of Sampling Goal: To obtain a study sample that is representative of the population of interest Sample designs are commonly used to control or minimize survey costs, facilitate survey administration, improve estimates for subgroups Most complex sampling designs tend to produce LARGER variances relative to a Simple Random Sample
18 Overview of Sampling Many national and other large surveys use sampling methods to identify respondents Examples of Sampling Designs include Simple Random Sampling Systematic Sampling Stratified Sampling Cluster Sampling
19 Examples of Sample Designs National Health Interview Survey uses a stratified multistage sample design to identify a nationally-representative civilian household population State-level stratification (Primary Sampling Units or PSUs) Census blocks within strata (Secondary Sampling Units or SSUs) Allows for over-sampling specific minority groups (African Americans and Hispanics) Households Persons within households One adult and one child Details at
20 Variance Estimation and Test Statistics Failure to take complex sampling designs into account will produce biased variance estimates and test statistics Analyzing survey data with a complex sampling design under the assumptions of a Simple Random Sample will produce smaller p-values and narrower confidence intervals and increase the likelihood of a Type I error Most standard statistical packages do not take into consideration the sample design to compute variances and test statistics SAS version 8+ has limited capabilities
21 Sample Weights Each respondent has a sampling weight which is equal to: WEIGHT = 1/(probability of selection) Weights are often adjusted for age, race/ethnicity, and non-response in national surveys Weights are used to inflate sample back to the population of interest Sum of WEIGHTS = size of the population of interest (e.g., non-institutionalized civilian US population) Failure to take weights into account will produced biased point estimates (proportions, means, regression coefficients)
22 Commonly Used Procedures SAS SAS-Callable SUDAAN PROC FREQ; PROC CROSSTAB; PROC MEANS; PROC DESCRIPT; PROC UNIVARIATE; PROC DESCRIPT; PROC LOGISTIC; PROC RLOGIST;
23 SUDAAN Language DESIGN NEST WEIGHT NOTE: This information is always provided in the survey documentation!
24 SUDAAN Language WEIGHT Inverse probability of selection Commonly adjusted for age, race/ethnicity, nonresponse REQUIRED to produce unbiased estimates and national estimates WEIGHT WTFA_AD;
25 SUDAAN Language DESIGN Specifies sample design Most common WR - With Replacement (NHIS, NIS, MEPS) WOR - Without Replacement (NAMCS) DESIGN=WR; Leads to specific variance estimation procedures
26 SUDAAN Language NEST Specifies sampling levels or stages in your design Most common STRATA: stratification variable PSU (primary sampling unit): primary cluster variable NEST STRATUM PSU; NOTE: Data set MUST be SORTED by the variables as listed on the NEST statement
27 Example Code: SAS Version of Bivariable Analysis PROC FREQ; TABLES race*mammo /chisq; WEIGHT wtfa_ad; RUN; SAS without WEIGHT option provides Accurate sample size estimates INACCURATE point estimates, test statistics and p- values! SAS with WEIGHT option provides ACCURATE point estimates INACCURATE test statistics and p-values!
28 Example Code: SUDAAN Version of Bivariable Analysis PROC CROSSTAB data=work.nhis filetype=sas DESIGN=WR; NEST stratum psu /missunit; SUBPOPN sample=1; WEIGHT wtfa_ad; SUBGROUP race mammo ; LEVELS 3 2 ; TABLES race*mammo; SETENV colwidth=9 decwidth=2 colspce=2; PRINT nsum wsum rowper colper / wsumfmt=f9.0 nsumfmt=f9.0; PRINT chisq chisqdf chisqp / chisqdffmt=f8.0 chisqpfmt=f8.4 style=nchs; RUN;
29 SAS Results: Proc Freq Without WEIGHT With WEIGHT
30 SAS Versus SUDAAN Results SAS with WEIGHT SUDAAN
31 Which WEIGHT do I use in my analysis? Example using NHIS Goal: Representative of US Population PERSON File N=100,000 WTFA Sample Adult Core Cancer Control Module N=32,000 WTFA_SA Sample Child Core N=16,000 WTFA_SC
32 Example Code: SUDAAN With SUBPOPN Statement PROC CROSSTAB data=work.nhis filetype=sas DESIGN=WR; NEST stratum psu /missunit; SUBPOPN sample=1; WEIGHT wtfa_ad; SUBGROUP race mammo ; LEVELS 3 2 ; TABLES race*mammo; SETENV colwidth=9 decwidth=2 colspce=2; PRINT nsum wsum rowper colper / wsumfmt=f9.0 nsumfmt=f9.0; PRINT chisq chisqdf chisqp / chisqdffmt=f8.0 chisqpfmt=f8.4 style=nchs; RUN;
33 Requirements of SUDAAN Quirks/Common Sources of Error Input dataset MUST be sorted by the design variables listed in the NEST statement All observations must be present to compute variance estimates Use SUBPOPN statement All analytic variables MUST be NONZERO (1,2,3 ) EXCEPTION: Outcome of a logistic regression model must be coded as 0,1 MISSUNIT option in NEST statement for most surveys NEST stratum psu /missunit;
34 Complex Surveys What to look for in documentation What is the survey design (NEST)? Strata, PSU If you combine multiple years of a survey then YEAR must be considered in the design How were respondents sampled? With replacement (WR) or without replacement (WOR) Which WEIGHT variable do I use? Final analysis weight (final basic weight) Use weight from smallest component of survey
Methods for Estimating Change from NSCAW I and NSCAW II
Methods for Estimating Change from NSCAW I and NSCAW II Paul Biemer Sara Wheeless Keith Smith RTI International is a trade name of Research Triangle Institute 1 Course Outline Review of NSCAW I and NSCAW
More informationMissing Data: What Are You Missing?
Missing Data: What Are You Missing? Craig D. Newgard, MD, MPH Jason S. Haukoos, MD, MS Roger J. Lewis, MD, PhD Society for Academic Emergency Medicine Annual Meeting San Francisco, CA May 006 INTRODUCTION
More informationCorrectly Compute Complex Samples Statistics
SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample
More information3.6 Sample code: yrbs_data <- read.spss("yrbs07.sav",to.data.frame=true)
InJanuary2009,CDCproducedareportSoftwareforAnalyisofYRBSdata, describingtheuseofsas,sudaan,stata,spss,andepiinfoforanalyzingdatafrom theyouthriskbehaviorssurvey. ThisreportprovidesthesameinformationforRandthesurveypackage.Thetextof
More informationHILDA PROJECT TECHNICAL PAPER SERIES No. 2/08, February 2008
HILDA PROJECT TECHNICAL PAPER SERIES No. 2/08, February 2008 HILDA Standard Errors: A Users Guide Clinton Hayes The HILDA Project was initiated, and is funded, by the Australian Government Department of
More informationAnalysis of Complex Survey Data with SAS
ABSTRACT Analysis of Complex Survey Data with SAS Christine R. Wells, Ph.D., UCLA, Los Angeles, CA The differences between data collected via a complex sampling design and data collected via other methods
More informationCorrectly Compute Complex Samples Statistics
PASW Complex Samples 17.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample
More informationBACKGROUND INFORMATION ON COMPLEX SAMPLE SURVEYS
Analysis of Complex Sample Survey Data Using the SURVEY PROCEDURES and Macro Coding Patricia A. Berglund, Institute For Social Research-University of Michigan, Ann Arbor, Michigan ABSTRACT The paper presents
More informationDual-Frame Weights (Landline and Cell) for the 2009 Minnesota Health Access Survey
Dual-Frame Weights (Landline and Cell) for the 2009 Minnesota Health Access Survey Kanru Xia 1, Steven Pedlow 1, Michael Davern 1 1 NORC/University of Chicago, 55 E. Monroe Suite 2000, Chicago, IL 60603
More informationMaintenance of NTDB National Sample
Maintenance of NTDB National Sample National Sample Project of the National Trauma Data Bank (NTDB), the American College of Surgeons Draft March 2007 ii Contents Section Page 1. Introduction 1 2. Overview
More informationFrequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS
ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion
More informationGeographic Accuracy of Cell Phone RDD Sample Selected by Area Code versus Wire Center
Geographic Accuracy of Cell Phone RDD Sample Selected by versus Xian Tao 1, Benjamin Skalland 1, David Yankey 2, Jenny Jeyarajah 2, Phil Smith 2, Meena Khare 3 1 NORC at the University of Chicago 2 National
More informationSurvey Questions and Methodology
Survey Questions and Methodology Spring Tracking Survey 2012 Data for March 15 April 3, 2012 Princeton Survey Research Associates International for the Pew Research Center s Internet & American Life Project
More informationChapter 17: INTERNATIONAL DATA PRODUCTS
Chapter 17: INTERNATIONAL DATA PRODUCTS After the data processing and data analysis, a series of data products were delivered to the OECD. These included public use data files and codebooks, compendia
More informationStatistical Analysis Using Combined Data Sources: Discussion JPSM Distinguished Lecture University of Maryland
Statistical Analysis Using Combined Data Sources: Discussion 2011 JPSM Distinguished Lecture University of Maryland 1 1 University of Michigan School of Public Health April 2011 Complete (Ideal) vs. Observed
More informationAcknowledgments. Acronyms
Acknowledgments Preface Acronyms xi xiii xv 1 Basic Tools 1 1.1 Goals of inference 1 1.1.1 Population or process? 1 1.1.2 Probability samples 2 1.1.3 Sampling weights 3 1.1.4 Design effects. 5 1.2 An introduction
More informationSurvey Questions and Methodology
Survey Questions and Methodology Winter Tracking Survey 2012 Final Topline 02/22/2012 Data for January 20 February 19, 2012 Princeton Survey Research Associates International for the Pew Research Center
More informationIPUMS Training and Development: Requesting Data
IPUMS Training and Development: Requesting Data IPUMS PMA Exercise 2 OBJECTIVE: Gain an understanding of how IPUMS PMA service delivery point datasets are structured and how it can be leveraged to explore
More informationWHO STEPS Surveillance Support Materials. STEPS Epi Info Training Guide
STEPS Epi Info Training Guide Department of Chronic Diseases and Health Promotion World Health Organization 20 Avenue Appia, 1211 Geneva 27, Switzerland For further information: www.who.int/chp/steps WHO
More informationImproved Sampling Weight Calibration by Generalized Raking with Optimal Unbiased Modification
Improved Sampling Weight Calibration by Generalized Raking with Optimal Unbiased Modification A.C. Singh, N Ganesh, and Y. Lin NORC at the University of Chicago, Chicago, IL 663 singh-avi@norc.org; nada-ganesh@norc.org;
More informationPoisson Regressions for Complex Surveys
Poisson Regressions for Complex Surveys Overview Researchers often use sample survey methodology to obtain information about a large population by selecting and measuring a sample from that population.
More informationGUIDE TO USING THE 2014 AND 2015 CURRENT POPULATION SURVEY PUBLIC USE FILES
GUIDE TO USING THE 2014 AND 2015 CURRENT POPULATION SURVEY PUBLIC USE FILES INTRODUCTION Tabulating estimates of health insurance coverage, income, and poverty from the redesigned survey TECHNICAL BRIEF
More informationWTADJX Example #1. SUDAAN Statements and Results Illustrated. Input Data Set(s): DAWN.SAS7bdat. Example. Solution
WTADJX Example #1 SUDAAN Statements and Results Illustrated Raking, raking to a size-variable Nearly pseudo-optimal calibration approach ADJUST = POST; POSTWGT CALVARS CLASS; VAR Input Data Set(s): DAWN.SAS7bdat
More informationDual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys
Dual-Frame Sample Sizes (RDD and Cell) for Future Minnesota Health Access Surveys Steven Pedlow 1, Kanru Xia 1, Michael Davern 1 1 NORC/University of Chicago, 55 E. Monroe Suite 2000, Chicago, IL 60603
More informationA Cross-national Comparison Using Stacked Data
A Cross-national Comparison Using Stacked Data Goal In this exercise, we combine household- and person-level files across countries to run a regression estimating the usual hours of the working-aged civilian
More informationSAS/STAT 13.1 User s Guide. The SURVEYFREQ Procedure
SAS/STAT 13.1 User s Guide The SURVEYFREQ Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS
More informationTelephone Survey Response: Effects of Cell Phones in Landline Households
Telephone Survey Response: Effects of Cell Phones in Landline Households Dennis Lambries* ¹, Michael Link², Robert Oldendick 1 ¹University of South Carolina, ²Centers for Disease Control and Prevention
More informationSmartphone Ownership 2013 Update
www.pewresearch.org JUNE 5, 2013 Smartphone Ownership 2013 Update 56% of American adults now own a smartphone of some kind; Android and iphone owners account for half of the cell phone user population.
More informationMultiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health
Multiple Imputation for Missing Data Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Outline Missing data mechanisms What is Multiple Imputation? Software Options
More informationINTRODUCTION to SAS STATISTICAL PACKAGE LAB 3
Topics: Data step Subsetting Concatenation and Merging Reference: Little SAS Book - Chapter 5, Section 3.6 and 2.2 Online documentation Exercise I LAB EXERCISE The following is a lab exercise to give you
More informationThe Rise of the Connected Viewer
JULY 17, 2012 The Rise of the Connected Viewer 52% of adult cell owners use their phones while engaging with televised content; younger audiences are particularly active in these connected viewing experiences
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use
More informationVariance Estimation in Presence of Imputation: an Application to an Istat Survey Data
Variance Estimation in Presence of Imputation: an Application to an Istat Survey Data Marco Di Zio, Stefano Falorsi, Ugo Guarnera, Orietta Luzi, Paolo Righi 1 Introduction Imputation is the commonly used
More informationThe Use of Sample Weights in Hot Deck Imputation
Journal of Official Statistics, Vol. 25, No. 1, 2009, pp. 21 36 The Use of Sample Weights in Hot Deck Imputation Rebecca R. Andridge 1 and Roderick J. Little 1 A common strategy for handling item nonresponse
More informationApplied Survey Data Analysis Module 2: Variance Estimation March 30, 2013
Applied Statistics Lab Applied Survey Data Analysis Module 2: Variance Estimation March 30, 2013 Approaches to Complex Sample Variance Estimation In simple random samples many estimators are linear estimators
More informationSampling Size Calculations for Estimating the Proportion of False Positive
Sampling Size Calculations for Estimating the Proportion of False Positive and False Negative CLABSIs in the State of Texas J. Charles Huber Jr, PhD Associate Professor of Biostatistics Ryan Hollingsworth
More information2017 NEW JERSEY STATEWIDE SURVEY ON OUR HEALTH AND WELL BEING Methodology Report December 1, 2017
207 NEW JERSEY STATEWIDE SURVEY ON OUR HEALTH AND WELL BEING Methodology Report December, 207 Prepared for: Center for State Health Policy Rutgers University 2 Paterson Street, 5th Floor New Brunswick,
More informationSAS/STAT 14.3 User s Guide The SURVEYFREQ Procedure
SAS/STAT 14.3 User s Guide The SURVEYFREQ Procedure This document is an individual chapter from SAS/STAT 14.3 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute
More informationMethods for Incorporating an Undersampled Cell Phone Frame When Weighting a Dual-Frame Telephone Survey
Methods for Incorporating an Undersampled Cell Phone Frame When Weighting a Dual-Frame Telephone Survey Elizabeth Ormson 1, Kennon R. Copeland 1, Stephen J. Blumberg 2, Kirk M. Wolter 1, and Kathleen B.
More informationSAS/STAT 13.1 User s Guide. The SURVEYSELECT Procedure
SAS/STAT 13.1 User s Guide The SURVEYSELECT Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS
More informationSampling Statistics Guide. Author: Ali Fadakar
Sampling Statistics Guide Author: Ali Fadakar An Introduction to the Sampling Interface Sampling interface is an interactive software package that uses statistical procedures such as random sampling, stratified
More information100 THE NUANCES OF COMBINING MULTIPLE HOSPITAL DATA
Paper 100 THE NUANCES OF COMBINING MULTIPLE HOSPITAL DATA Jontae Sanders, MPH, Charlotte Baker, DrPH, MPH, CPH, and C. Perry Brown, DrPH, MSPH, Florida Agricultural and Mechanical University ABSTRACT Hospital
More informationMISSING DATA AND MULTIPLE IMPUTATION
Paper 21-2010 An Introduction to Multiple Imputation of Complex Sample Data using SAS v9.2 Patricia A. Berglund, Institute For Social Research-University of Michigan, Ann Arbor, Michigan ABSTRACT This
More informationWORKSHOP: Using the Health Survey for England, 2014
WORKSHOP: Using the Health Survey for England, 2014 There are three sections to this workshop, each with a separate worksheet. The worksheets are designed to be accessible to those who have no prior experience
More informationBUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office)
SAS (Base & Advanced) Analytics & Predictive Modeling Tableau BI 96 HOURS Practical Learning WEEKDAY & WEEKEND BATCHES CLASSROOM & LIVE ONLINE DexLab Certified BUSINESS ANALYTICS Training Module Gurgaon
More informationThe Growing Gap between Landline and Dual Frame Election Polls
MONDAY, NOVEMBER 22, 2010 Republican Share Bigger in -Only Surveys The Growing Gap between and Dual Frame Election Polls FOR FURTHER INFORMATION CONTACT: Scott Keeter Director of Survey Research Michael
More information186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95
A Statistical Analysis Macro Library in SAS Carl R. Haske, Ph.D., STATPROBE, nc., Ann Arbor, M Vivienne Ward, M.S., STATPROBE, nc., Ann Arbor, M ABSTRACT Statistical analysis plays a major role in pharmaceutical
More informationSAS/STAT 14.3 User s Guide The SURVEYSELECT Procedure
SAS/STAT 14.3 User s Guide The SURVEYSELECT Procedure This document is an individual chapter from SAS/STAT 14.3 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute
More informationProduct Catalog. AcaStat. Software
Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easy-to-use data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,
More informationExample 1 - Joining datasets by a common variable: Creating a single table using multiple datasets Other features illustrated: Aggregate data multi-variable recode, computational calculation Background:
More informationWHO STEPS Surveillance Support Materials. Mapping and Transforming Your Materials to Use the Generic STEPS Tools
Mapping and Transforming Your Materials to Use the Generic STEPS Tools Department of Chronic Diseases and Health Promotion World Health Organization 20 Avenue Appia, 1211 Geneva 27, Switzerland For further
More informationApplied Regression Modeling: A Business Approach
i Applied Regression Modeling: A Business Approach Computer software help: SAS SAS (originally Statistical Analysis Software ) is a commercial statistical software package based on a powerful programming
More informationWeighting and estimation for the EU-SILC rotational design
Weighting and estimation for the EUSILC rotational design JeanMarc Museux 1 (Provisional version) 1. THE EUSILC INSTRUMENT 1.1. Introduction In order to meet both the crosssectional and longitudinal requirements,
More informationSTAT 3304/5304 Introduction to Statistical Computing. Introduction to SAS
STAT 3304/5304 Introduction to Statistical Computing Introduction to SAS What is SAS? SAS (originally an acronym for Statistical Analysis System, now it is not an acronym for anything) is a program designed
More informationIvy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V)
Ivy s Business Analytics Foundation Certification Details (Module I + II+ III + IV + V) Based on Industry Cases, Live Exercises, & Industry Executed Projects Module (I) Analytics Essentials 81 hrs 1. Statistics
More informationBayesian Inference for Sample Surveys
Bayesian Inference for Sample Surveys Trivellore Raghunathan (Raghu) Director, Survey Research Center Professor of Biostatistics University of Michigan Distinctive features of survey inference 1. Primary
More informationIPUMS Training and Development: Requesting Data
IPUMS Training and Development: Requesting Data IPUMS PMA Exercise 2 OBJECTIVE: Gain an understanding of how IPUMS PMA service delivery point datasets are structured and how it can be leveraged to explore
More informationGETTING DATA INTO THE PROGRAM
GETTING DATA INTO THE PROGRAM 1. Have a Stata dta dataset. Go to File then Open. OR Type use pathname in the command line. 2. Using a SAS or SPSS dataset. Use Stat Transfer. (Note: do not become dependent
More informationThe Importance of Modeling the Sampling Design in Multiple. Imputation for Missing Data
The Importance of Modeling the Sampling Design in Multiple Imputation for Missing Data Jerome P. Reiter, Trivellore E. Raghunathan, and Satkartar K. Kinney Key Words: Complex Sampling Design, Multiple
More informationStatistical and Computational Challenges in Combining Information from Multiple data Sources. T. E. Raghunathan University of Michigan
Statistical and Computational Challenges in Combining Information from Multiple data Sources T. E. Raghunathan University of Michigan Opportunities Computational ability and cheap storage has made digitally
More informationMethods for Producing Consistent Control Totals for Benchmarking in Survey Sampling
Methods for Producing Consistent Control Totals for Benchmarking in Survey Sampling Ismael Flores Cervantes 1 1 Westat, 1600 Research Blvd, Rockville, MD 20850 Abstract Estimates from probability samples
More informationUsing PROC PLAN for Randomization Assignments
Using PROC PLAN for Randomization Assignments Miriam W. Rosenblatt Division of General Internal Medicine and Health Care Research, University. Hospitals of Cleveland Abstract This tutorial is an introduction
More informationSegmented or Overlapping Dual Frame Samples in Telephone Surveys
Vol. 3, Issue 6, 2010 Segmented or Overlapping Dual Frame Samples in Telephone Surveys John M Boyle *, Faith Lewis, Brian Tefft * Institution: Abt SRBI Institution: Abt SRBI Institution: AAA Foundation
More information2. Description of the Procedure
1. Introduction Item nonresponse occurs when questions from an otherwise completed survey questionnaire are not answered. Since the population estimates formed by ignoring missing data are often biased,
More information1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file
1 SPSS Guide 2009 Content 1. Basic Steps for Data Analysis. 3 2. Data Editor. 2.4.To create a new SPSS file 3 4 3. Data Analysis/ Frequencies. 5 4. Recoding the variable into classes.. 5 5. Data Analysis/
More informationCenters for Disease Control and Prevention National Center for Health Statistics
Wireless-Only and Wireless-Mostly Households: A growing challenge for telephone surveys Stephen Blumberg sblumberg@cdc.gov Julian Luke jluke@cdc.gov Centers for Disease Control and Prevention National
More informationSmartphones, Race, and Internet Health IT SMARTPHONE OWNERSHIP AND RACIAL/ETHNIC DISPARITIES IN ACCESSING INTERNET HEALTH INFORMATION
SMARTPHONE OWNERSHIP AND RACIAL/ETHNIC DISPARITIES IN ACCESSING INTERNET HEALTH INFORMATION Jamie M. 1 INTRODUCTION In the United States today, racial/ethnic disparities in access to health care and quality
More informationLecture 1: Statistical Reasoning 2. Lecture 1. Simple Regression, An Overview, and Simple Linear Regression
Lecture Simple Regression, An Overview, and Simple Linear Regression Learning Objectives In this set of lectures we will develop a framework for simple linear, logistic, and Cox Proportional Hazards Regression
More informationPreparing for Data Analysis
Preparing for Data Analysis Prof. Andrew Stokes March 21, 2017 Managing your data Entering the data into a database Reading the data into a statistical computing package Checking the data for errors and
More informationWant to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research
Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research Liping Huang, Center for Home Care Policy and Research, Visiting Nurse Service of New York, NY, NY ABSTRACT The
More informationData Statistics Population. Census Sample Correlation... Statistical & Practical Significance. Qualitative Data Discrete Data Continuous Data
Data Statistics Population Census Sample Correlation... Voluntary Response Sample Statistical & Practical Significance Quantitative Data Qualitative Data Discrete Data Continuous Data Fewer vs Less Ratio
More informationWELCOME! Lecture 3 Thommy Perlinger
Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important
More informationMean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242
Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types
More informationUsing NHGIS: An Introduction
Using NHGIS: An Introduction August 2014 Funding provided by the National Science Foundation and National Institutes of Health. Project support provided by the Minnesota Population Center at the University
More informationIntroduction to STATA
Center for Teaching, Research and Learning Research Support Group American University, Washington, D.C. Hurst Hall 203 rsg@american.edu (202) 885-3862 Introduction to STATA WORKSHOP OBJECTIVE: This workshop
More informationAcaStat User Manual. Version 8.3 for Mac and Windows. Copyright 2014, AcaStat Software. All rights Reserved.
AcaStat User Manual Version 8.3 for Mac and Windows Copyright 2014, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents INTRODUCTION... 5 GETTING HELP... 5 INSTALLATION... 5
More informationComputing Optimal Strata Bounds Using Dynamic Programming
Computing Optimal Strata Bounds Using Dynamic Programming Eric Miller Summit Consulting, LLC 7/27/2012 1 / 19 Motivation Sampling can be costly. Sample size is often chosen so that point estimates achieve
More informationMissing Data? A Look at Two Imputation Methods Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA
Missing Data? A Look at Two Imputation Methods Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA ABSTRACT Statistical analyses can be greatly hampered by missing
More informationDecent Standard of Living Index (DSLI) Project Meeting
Decent Standard of Living Index (DSLI) Project Meeting Dr Gemma Wright Cape Town Friday 22 nd March 2018 Overview What is a Decent Standard of Living (DSL) threshold? How might a DSL measure be useful?
More informationSplitting the follow-up C&H 6
Splitting the follow-up C&H 6 Bendix Carstensen Steno Diabetes Center & Department of Biostatistics, University of Copenhagen bxc@steno.dk www.biostat.ku.dk/~bxc PhD-course in Epidemiology, Department
More informationSTATISTICS (STAT) 200 Level Courses. 300 Level Courses. Statistics (STAT) 1
Statistics (STAT) 1 STATISTICS (STAT) 200 Level Courses STAT 250: Introductory Statistics I. 3 credits. Elementary introduction to statistics. Topics include descriptive statistics, probability, and estimation
More informationSOS3003 Applied data analysis for social science Lecture note Erling Berge Department of sociology and political science NTNU.
SOS3003 Applied data analysis for social science Lecture note 04-2009 Erling Berge Department of sociology and political science NTNU Erling Berge 2009 1 Missing data Literature Allison, Paul D 2002 Missing
More informationInternational data products
International data products Public use files... 376 Codebooks for the PISA 2015 public use data files... 377 Data compendia tables... 378 Data analysis and software tools... 378 International Database
More informationEXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT)
EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT) DESCRIPTION: This example shows how to combine the data on respondents from the first two waves of Understanding Society into
More informationA Bayesian analysis of survey design parameters for nonresponse, costs and survey outcome variable models
A Bayesian analysis of survey design parameters for nonresponse, costs and survey outcome variable models Eva de Jong, Nino Mushkudiani and Barry Schouten ASD workshop, November 6-8, 2017 Outline Bayesian
More informationLandline and Cell Phone Usage Patterns in a Large Urban Setting: Results from the 2008 New York City Community Health Survey
Landline and Cell Phone Usage Patterns in a Large Urban Setting: Results from the 2008 New York City Community Health Survey Stephen Immerwahr 1, Donna Eisenhower 1, Michael Sanderson 1 Michael P. Battaglia
More informationUsing The System For Medical Data Processing And Event Analysis: An Overview
Using The SAS@ System For Medical Data Processing And Event Analysis: An Overview Harald Pitz, Frankfurt University Hospital Marcus Frenz, Frankfurt University Hospital Hans-Peter Howaldt, Frankfurt University
More informationBring Your Own Device and the 2020 Census Research & Testing
Bring Your Own Device and the 2020 Census Research & Testing Ryan King, Evan Moffett, Jennifer Hunter Childs, Jay Occhiogrosso, Scott Williams U.S. Census Bureau 4600 Silver Hill Rd, Suitland, MD 20746
More informationModule I: Clinical Trials a Practical Guide to Design, Analysis, and Reporting 1. Fundamentals of Trial Design
Module I: Clinical Trials a Practical Guide to Design, Analysis, and Reporting 1. Fundamentals of Trial Design Randomized the Clinical Trails About the Uncontrolled Trails The protocol Development The
More informationThe SURVEYSELECT Procedure
SAS/STAT 9.2 User s Guide The SURVEYSELECT Procedure (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation for the complete
More informationSample: n=2,252 national adults, age 18 and older, including 1,127 cell phone interviews Interviewing dates:
Survey Questions Spring 2013 Tracking Survey Final Topline 5/21/2013 Data for April 17-May 19, 2013 Princeton Survey Research Associates International for the Pew Research Center s Internet & American
More informationThe partial Package. R topics documented: October 16, Version 0.1. Date Title partial package. Author Andrea Lehnert-Batar
The partial Package October 16, 2006 Version 0.1 Date 2006-09-21 Title partial package Author Andrea Lehnert-Batar Maintainer Andrea Lehnert-Batar Depends R (>= 2.0.1),e1071
More informationPaper SDA-11. Logistic regression will be used for estimation of net error for the 2010 Census as outlined in Griffin (2005).
Paper SDA-11 Developing a Model for Person Estimation in Puerto Rico for the 2010 Census Coverage Measurement Program Colt S. Viehdorfer, U.S. Census Bureau, Washington, DC This report is released to inform
More informationIntroduction to SAS. I. Understanding the basics In this section, we introduce a few basic but very helpful commands.
Center for Teaching, Research and Learning Research Support Group American University, Washington, D.C. Hurst Hall 203 rsg@american.edu (202) 885-3862 Introduction to SAS Workshop Objective This workshop
More informationWorkshop Calibration tools for Survey Statisticians
Workshop Calibration tools for Survey Statisticians Comparison of 3 calibration softwares Guillaume Chauvet Jean-Claude Deville Mohammed El Haj Tirari Josiane Le Guennec CREST-ENSAI, France 1/60 09/09/05
More information2. Don t forget semicolons and RUN statements The two most common programming errors.
Randy s SAS hints March 7, 2013 1. Always begin your programs with internal documentation. * ***************** * Program =test1, Randy Ellis, March 8, 2013 ***************; 2. Don t forget semicolons and
More informationBUSINESS DECISION MAKING. Topic 1 Introduction to Statistical Thinking and Business Decision Making Process; Data Collection and Presentation
BUSINESS DECISION MAKING Topic 1 Introduction to Statistical Thinking and Business Decision Making Process; Data Collection and Presentation (Chap 1 The Nature of Probability and Statistics) (Chap 2 Frequency
More informationComparative Evaluation of Synthetic Dataset Generation Methods
Comparative Evaluation of Synthetic Dataset Generation Methods Ashish Dandekar, Remmy A. M. Zen, Stéphane Bressan December 12, 2017 1 / 17 Open Data vs Data Privacy Open Data Helps crowdsourcing the research
More informationHow to Use the Cancer-Rates.Info/NJ
How to Use the Cancer-Rates.Info/NJ Web- Based Incidence and Mortality Mapping and Inquiry Tool to Obtain Statewide and County Cancer Statistics for New Jersey Cancer Incidence and Mortality Inquiry System
More informationUsing Mixed-Mode Contacts in Client Surveys: Getting More Bang for Your Buck
June 2013 Volume 51 Number 3 Article # 3FEA1 Using Mixed-Mode Contacts in Client Surveys: Getting More Bang for Your Buck Abstract Surveys are commonly used in Extension to identify client needs or evaluate
More information