Analysis of Complex Survey Data with SAS
|
|
- Juliana Beasley
- 6 years ago
- Views:
Transcription
1 ABSTRACT Analysis of Complex Survey Data with SAS Christine R. Wells, Ph.D., UCLA, Los Angeles, CA The differences between data collected via a complex sampling design and data collected via other methods need to be taken into account when analyzing complex survey data. The elements of the data unique to complex survey data are defined and discussed. Examples of procedures for descriptive statistics and graphs are given for continuous and categorical variables. The analysis of domains, sometimes called subpopulations, is discussed, followed by examples of ordinary least squares regression and logistic regression. INTRODUCTION The most important part of the analysis of complex survey data analysis is correctly specifying the elements of the sampling plan. These elements indicate how the data collection process differed from a simple random sample. The math used to calculate the point estimates and standard errors is different for data collected via a simple random sample and complex surveys. There are numerous methods by which complex survey data can be collected, and the correct math to be used depends on how the data were collected. Because of this, it is important to carefully read the documentation for the data. When data are collected via a simple random sample, all elements of the population have an equal probability of being selected into the sample. This assumption is built into the math that underlies most of the procedures in SAS. With complex survey data, the analyst explicitly acknowledges that the data were not collected via a simple random sample. The most common way in which complex survey data are different from data collected via a simple random sample is that the elements of the population do not have an equal probability of being selected into the sample. With complex survey data, a variable is included in the data set that gives the inverse of the probability of selection for each observation. This is called the probability weight. Often, corrections and adjustments are made to this weight, and it is called a sampling weight. This variable is incorporated into the calculation of all weighted point estimates (e.g., means, frequencies, regression coefficients). The sum of the sampling weights should give a reasonable estimate of the number of elements in the population. The population may be stratified before data collection begins. This means that the population is broken up into groups such that each element of the population belongs to one and only one stratum. For example, the population of the United States may be stratified by location (such as states), or by demographics characteristics such as gender or age. Many categorical variables may be combined to create the strata. If the stratification variables are related to the outcome variable, the stratification will reduce the standard errors. However, for most analyses with public-use survey data sets, the stratification may decrease or increase the standard errors. Another element common to complex survey data sets that influences the calculation of the standard errors is clustering. In practice, it is very difficult to obtain a simple random sample. Instead, cluster sampling is used. In cluster sampling, large units are selected first. In single-cluster sampling, all of the elements within each selected cluster are included in the sample. In multiple-stage cluster sampling, large clusters are sampled from the population, and then smaller clusters are sampled from each large cluster that has been selected into the sample. This process continues until elements are selected. For example, metropolitan statistical units (MSAs) may be sampled, and then city blocks, and then households, and then a person. The effect of cluster sampling is usually to increase the standard errors. In summary, the sampling weight affects the calculation of the point estimate, and the stratification and the clustering affect the calculation of the standard error. Why be concerned with the standard error? The reason is that most test statistics are computed as point estimate divided by its standard error. For most public-use data sets, a stratification variable and a cluster variable are included in the data set. The names of these variables can be determined by reading the documentation that comes with the data set. The cluster variable is sometimes called the primary sampling unit (PSU). The primary sampling unit refers to the first level sampling. For example, if MSAs are sampled, and then blocks within the selected MSAs, and then households on the selected blocks, the primary sampling unit is MSAs. Usually, there are two or more clusters in each stratum. If there is only one cluster in a stratum, SAS will put a note in the log file. Another way to correct the standard errors for the sampling plan is to use replicate weights, which are a series of variables that are included with the survey data set. We will not be discussing replicate weights further.
2 INTRODUCTION TO THE EXAMPLE DATA SET For the following examples, the National Health and Nutrition Survey (NHANES) data will be used. Variables from the demographics, body measurement, and diet behavior and nutrition data sets will be used. For the variables for which missing data codes (such as 7, 8 and 9) were used, those values have been converted to missing for purpose of these analyses. DESCRIPTIVE STATISTICS WITH CONTINUOUS VARIABLES PROC SURVEYMEANS is the SAS procedure that is most often used to calculate descriptive statistics for continuous variables. A wide variety of descriptive statistics can be produced, including means, medians and percentiles. Graphs that correctly account for the sampling weight, such as histograms and boxplots, can also be produced. As of SAS/STAT 14.2, weighted correlations cannot be calculated. Some examples are given below. In the first example, only the basic specification will be used. In the next example, some options are included on the PROC SURVEYMEANS statement. These options request that the minimum value, mean, maximum value and the range be included in the output. proc surveymeans data = nhanes2012 min mean max range; In the next example, graphs are requested. The graphs will include a histogram and a boxplot. ods graphics on; proc surveymeans data = nhanes2012 plots = all; ods graphics off; DESCRIPTIVE STATISTICS WITH CATEGORICAL VARIABLES PROC SURVEYFREQ is the SAS procedure that is most often used to calculate descriptive statistics for categorical variables. One- and two-way tabulations can be produced. Several different types of chi-squared tests can be calculated for two-way cross-tabulations. Graphs, including kappa, mosaic, odds ratio, relative risk, risk difference, weighted frequency and weighted kappa plots can be produced. Some examples are given below. The first example shows the basic syntax. proc surveyfreq data = nhanes2012; tables female; In the next example, a cross-tabulation will be requested. A FORMAT statement will also be included. This is useful for making the output more easily interpretable.
3 proc surveyfreq data = nhanes2012; tables female; format female fm. cbq600r cb.; In the next example, the expected values, row and column percentages, and different versions of the chi-square test will be requested. proc surveyfreq data = nhanes2012; tables female; format female fm. cbq600r cb.; In the following example, a weighted frequency plot will be requested. ods graphics on; proc surveyfreq data = nhanes2012; tables dmdeduc2 / plots = wtfreqplot; format dmdeduc2 ed.; ods graphics off; ANALYSIS OF DOMAINS Sometimes, an analysis is to include only some members of the population but not others. For example, perhaps the analysis should include only women, or only women over age 50. Such analyses are called domain, or subpopulation, analyses. If the data were not weighted, the analyst could use a WHERE statement to include only the desired observations. If the analysis was to be done for women and men separately, a BY statement could be used. However, the math that is used with a WHERE statement or a BY statement produces standard errors that have an interpretation that is different from the interpretation that most analysts are seeking. Because of this, most of the survey procedures have a DOMAIN statement. One or more categorical variables can be specified on the DOMAIN statement, and SAS will run the analysis for every combination of the variables listed. There is no DOMAIN statement in PROC SURVEYFREQ. Instead, the variables that would have been specified on the DOMAIN statement can be added to the TABLES statement. What is the difference between the way a BY statement (or a WHERE statement) and a DOMAIN statement calculates the standard errors? Let us take a simple example with a single variable on a BY statement. With the BY statement, the data set is broken into two groups. Both the point estimate and its standard error are calculated using only the observations in that group. In contrast, with a DOMAIN statement, the point estimate is calculated using only the observations that are part of the domain, but all of the data are used in the calculation of the standard error. The method of calculation of the standard error allows the results of the analysis to be generalized to all elements in the subpopulation in the population. SAS produces output for every combination of the variables listed on the DOMAIN statement, and this can sometimes mean a great deal of output. Binary domain variables can be created in a data step. Typically, such a variable would be coded 1 for all observations that are to be included in the domain, and 0 otherwise. No observation should have a missing value. The following examples will start with the simplest situation, in which only one variable is specified on the DOMAIN statement, and progress to more complicated, but potentially more useful, uses. domain female;
4 In the next example, a FORMAT statement will be used. domain female; In the next example, two categorical variables will be given on the DOMAIN statement, but those variables will not be crossed. domain female adult; format female fm. adult ad.; The variables on the DOMAIN statement are crossed in the next example, and a FORMAT statement is used. domain female*adult; format female fm. adult ad.; The difference between means from two domains can be tested. There are at least two ways to do this using PROC SURVEYREG. One method uses a CONTRAST statement, and the other uses an LSMEANS statement. Notice that on the MODEL statement, the NOINT option has been used so that both means are estimated, rather than the intercept (which is the mean of the reference group) and the difference between the means. PROC SURVEYMEANS and PROC SURVEYREG calculate the variance estimates differently. The VADJUST = NONE option is used to get the variance estimates that are given by PROC SURVEYMEANS. The SOLUTION option is used to request the point estimates in the output. PROC SURVEYMEANS is shown first only to show the mean height for each gender. domain female; proc surveyreg data = nhanes2012; class female; model htfeet = female / noint solution vadjust = none; contrast comparing males and females female 1-1;
5 proc surveyreg data = nhanes2012; class female; model htfeet = female / noint solution vadjust = none; lsmeans female / diff; ORDINARY LEAST SQUARES REGRESSION Once the elements of the sampling plan have been taken into account, ordinary least squares regression (OLS regression) is very much like OLS regression with unweighted data. PROC SURVEYREG is used to run OLS regression with complex survey data. As before, if only part of the data are to be included in the analysis, a DOMAIN statement can be used. There is a CLASS statement in PROC SURVEYREG. How reference categories are specified depends on the presence or absence of a FORMAT statement. proc surveyreg data = nhanes2012; class female (reference = male ); model htfeet = female ridageyr / solution; LOGISTIC REGRESSION PROC SURVEYLOGISTIC can be used to conduct binary, ordinal and nominal (i.e., multinomial) logistic regression analyses. There is a CLASS statement in PROC SURVEYLOGISTIC. How reference categories are specified depends on the presence or absence of a FORMAT statement. proc surveylogistic data = nhanes2012; class dmdeduc2 (reference = 3 ) female (reference = male ) / param = ref; model cbq600r (desc) = dmdeduc2 female ridageyr; WEIGHTED MULTILEVEL MODELS Starting with SAS/STAT 13.1, weighted multilevel models can be run with PROC GLIMMIX. To run a weighted multilevel model, PROC GLIMMIX must be used; both linear and nonlinear models can be specified. Running a weighted linear multilevel model is more complicated than running a weighted OLS regression. First, sampling weights at each level of the multilevel model must be specified. For example, for a two-level model, sampling weights must be specified at level 1 and level 2. This is because the level 1 and level 2 sampling weights enter into the equation for the pseudo-likelihood at different places. Additionally, consideration should be given to the scaling of the level 1 sampling weights. There are a few different choices for scaling methods. If rescaling needs to be done, it must be done in a data step before running PROC GLIMMIX. It is assumed that the highest level of the multilevel corresponds to the primary sampling units of the sampling plan. For example, if MSAs are the PSUs, then MSAs should be the level 2 units in a two level multilevel model. The METHOD = QUADRATURE option is used to request the weighted likelihood, which is called a pseudolikelihood. The EMPIRICAL = CLASSICAL option is used to request sandwich variance estimators.
6 proc glimmix data = wishing method = quadrature(qpoints=10) empirical = classical; model dv = IV1 IV2 IV3 / obsweight = level1wt solution; random intercept / subject = level2id weight = level2wt; CONCLUSION Analyzing data collected via a complex sampling design is different than analyzing data collected via a simple random sample. However, there are also many similarities. Special attention should be given to the elements of the sampling plan to ensure that they are properly incorporated into the analysis. The analysis of subgroups of observations should be done with a DOMAIN statement. Many types of weighted regressions are possible, including weighted linear and nonlinear multilevel models. REFERENCES Heeringa, S. G., West, B. T., and Berglund, P. A. Applied Survey Data analysis, Second Edition (2017). Boca Raton, FL: CRC Press. Lewis, T. H. Complex Survey Data Analysis with SAS. (2017). Boca Raton, FL: CRC Press. Zhu, M. (2014). Analyzing Multilevel Models with the GLIMMIX Procedure. In Proceedings of the SAS Global Forum 2014 Conference. Cary, NC: SAS Institute Inc. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Christine Wells, Ph.D. UCLA 5308 Math Sciences Box Los Angeles, CA crwells@ucla.edu SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies.
Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS
ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion
More informationCorrectly Compute Complex Samples Statistics
SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample
More informationMISSING DATA AND MULTIPLE IMPUTATION
Paper 21-2010 An Introduction to Multiple Imputation of Complex Sample Data using SAS v9.2 Patricia A. Berglund, Institute For Social Research-University of Michigan, Ann Arbor, Michigan ABSTRACT This
More informationCorrectly Compute Complex Samples Statistics
PASW Complex Samples 17.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample
More informationBACKGROUND INFORMATION ON COMPLEX SAMPLE SURVEYS
Analysis of Complex Sample Survey Data Using the SURVEY PROCEDURES and Macro Coding Patricia A. Berglund, Institute For Social Research-University of Michigan, Ann Arbor, Michigan ABSTRACT The paper presents
More informationSAS/STAT 13.1 User s Guide. The SURVEYFREQ Procedure
SAS/STAT 13.1 User s Guide The SURVEYFREQ Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS
More informationNHANES June Introduction. Data information & loading data. Using dynamic data within a typical classroom
NHANES June 2016 Introduction The NHANES data come from the National Health and Nutrition Examination Survey, surveys given nationwide by the Center for Disease Controls (CDC). The data are collected to
More informationCHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA
Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent
More information3.6 Sample code: yrbs_data <- read.spss("yrbs07.sav",to.data.frame=true)
InJanuary2009,CDCproducedareportSoftwareforAnalyisofYRBSdata, describingtheuseofsas,sudaan,stata,spss,andepiinfoforanalyzingdatafrom theyouthriskbehaviorssurvey. ThisreportprovidesthesameinformationforRandthesurveypackage.Thetextof
More informationSAS/STAT 14.3 User s Guide The SURVEYFREQ Procedure
SAS/STAT 14.3 User s Guide The SURVEYFREQ Procedure This document is an individual chapter from SAS/STAT 14.3 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute
More informationPoisson Regressions for Complex Surveys
Poisson Regressions for Complex Surveys Overview Researchers often use sample survey methodology to obtain information about a large population by selecting and measuring a sample from that population.
More informationAcknowledgments. Acronyms
Acknowledgments Preface Acronyms xi xiii xv 1 Basic Tools 1 1.1 Goals of inference 1 1.1.1 Population or process? 1 1.1.2 Probability samples 2 1.1.3 Sampling weights 3 1.1.4 Design effects. 5 1.2 An introduction
More informationHILDA PROJECT TECHNICAL PAPER SERIES No. 2/08, February 2008
HILDA PROJECT TECHNICAL PAPER SERIES No. 2/08, February 2008 HILDA Standard Errors: A Users Guide Clinton Hayes The HILDA Project was initiated, and is funded, by the Australian Government Department of
More informationIntroduction to Mplus
Introduction to Mplus May 12, 2010 SPONSORED BY: Research Data Centre Population and Life Course Studies PLCS Interdisciplinary Development Initiative Piotr Wilk piotr.wilk@schulich.uwo.ca OVERVIEW Mplus
More informationCHAPTER 1 INTRODUCTION
Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,
More informationPaper SDA-11. Logistic regression will be used for estimation of net error for the 2010 Census as outlined in Griffin (2005).
Paper SDA-11 Developing a Model for Person Estimation in Puerto Rico for the 2010 Census Coverage Measurement Program Colt S. Viehdorfer, U.S. Census Bureau, Washington, DC This report is released to inform
More informationResearch Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel
Research Methods for Business and Management Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel A Simple Example- Gym Purpose of Questionnaire- to determine the participants involvement
More informationSTA 570 Spring Lecture 5 Tuesday, Feb 1
STA 570 Spring 2011 Lecture 5 Tuesday, Feb 1 Descriptive Statistics Summarizing Univariate Data o Standard Deviation, Empirical Rule, IQR o Boxplots Summarizing Bivariate Data o Contingency Tables o Row
More informationAcquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.
Summary Statistics Acquisition Description Exploration Examination what data is collected Characterizing properties of data. Exploring the data distribution(s). Identifying data quality problems. Selecting
More informationData Quality Control: Using High Performance Binning to Prevent Information Loss
SESUG Paper DM-173-2017 Data Quality Control: Using High Performance Binning to Prevent Information Loss ABSTRACT Deanna N Schreiber-Gregory, Henry M Jackson Foundation It is a well-known fact that the
More information1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file
1 SPSS Guide 2009 Content 1. Basic Steps for Data Analysis. 3 2. Data Editor. 2.4.To create a new SPSS file 3 4 3. Data Analysis/ Frequencies. 5 4. Recoding the variable into classes.. 5 5. Data Analysis/
More informationWant to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research
Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research Liping Huang, Center for Home Care Policy and Research, Visiting Nurse Service of New York, NY, NY ABSTRACT The
More informationCoding Categorical Variables in Regression: Indicator or Dummy Variables. Professor George S. Easton
Coding Categorical Variables in Regression: Indicator or Dummy Variables Professor George S. Easton DataScienceSource.com This video is embedded on the following web page at DataScienceSource.com: DataScienceSource.com/DummyVariables
More informationThe SURVEYREG Procedure
SAS/STAT 9.2 User s Guide The SURVEYREG Procedure (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation for the complete
More informationSD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG
Paper SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG Qixuan Chen, University of Michigan, Ann Arbor, MI Brenda Gillespie, University of Michigan, Ann Arbor, MI ABSTRACT This paper
More informationRight-click on whatever it is you are trying to change Get help about the screen you are on Help Help Get help interpreting a table
Q Cheat Sheets What to do when you cannot figure out how to use Q What to do when the data looks wrong Right-click on whatever it is you are trying to change Get help about the screen you are on Help Help
More informationTelephone Survey Response: Effects of Cell Phones in Landline Households
Telephone Survey Response: Effects of Cell Phones in Landline Households Dennis Lambries* ¹, Michael Link², Robert Oldendick 1 ¹University of South Carolina, ²Centers for Disease Control and Prevention
More informationResearch with Large Databases
Research with Large Databases Key Statistical and Design Issues and Software for Analyzing Large Databases John Ayanian, MD, MPP Ellen P. McCarthy, PhD, MPH Society of General Internal Medicine Chicago,
More informationProduct Catalog. AcaStat. Software
Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easy-to-use data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,
More informationSAS/STAT 14.2 User s Guide. The SURVEYREG Procedure
SAS/STAT 14.2 User s Guide The SURVEYREG Procedure This document is an individual chapter from SAS/STAT 14.2 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute
More informationChapter 17: INTERNATIONAL DATA PRODUCTS
Chapter 17: INTERNATIONAL DATA PRODUCTS After the data processing and data analysis, a series of data products were delivered to the OECD. These included public use data files and codebooks, compendia
More informationSAS/STAT 14.1 User s Guide. The SURVEYREG Procedure
SAS/STAT 14.1 User s Guide The SURVEYREG Procedure This document is an individual chapter from SAS/STAT 14.1 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute
More informationSPSS TRAINING SPSS VIEWS
SPSS TRAINING SPSS VIEWS Dataset Data file Data View o Full data set, structured same as excel (variable = column name, row = record) Variable View o Provides details for each variable (column in Data
More information186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95
A Statistical Analysis Macro Library in SAS Carl R. Haske, Ph.D., STATPROBE, nc., Ann Arbor, M Vivienne Ward, M.S., STATPROBE, nc., Ann Arbor, M ABSTRACT Statistical analysis plays a major role in pharmaceutical
More informationData Quality Control for Big Data: Preventing Information Loss With High Performance Binning
Data Quality Control for Big Data: Preventing Information Loss With High Performance Binning ABSTRACT Deanna Naomi Schreiber-Gregory, Henry M Jackson Foundation, Bethesda, MD It is a well-known fact that
More informationSTAT10010 Introductory Statistics Lab 2
STAT10010 Introductory Statistics Lab 2 1. Aims of Lab 2 By the end of this lab you will be able to: i. Recognize the type of recorded data. ii. iii. iv. Construct summaries of recorded variables. Calculate
More informationJMP Clinical. Release Notes. Version 5.0
JMP Clinical Version 5.0 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP, A Business Unit of SAS SAS Campus Drive
More informationData analysis using Microsoft Excel
Introduction to Statistics Statistics may be defined as the science of collection, organization presentation analysis and interpretation of numerical data from the logical analysis. 1.Collection of Data
More informationPaper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by
Paper CC-016 A macro for nearest neighbor Lung-Chang Chien, University of North Carolina at Chapel Hill, Chapel Hill, NC Mark Weaver, Family Health International, Research Triangle Park, NC ABSTRACT SAS
More informationUSER S GUIDE LATENT GOLD 4.0. Innovations. Statistical. Jeroen K. Vermunt & Jay Magidson. Thinking outside the brackets! TM
LATENT GOLD 4.0 USER S GUIDE Jeroen K. Vermunt & Jay Magidson Statistical Innovations Thinking outside the brackets! TM For more information about Statistical Innovations Inc. please visit our website
More informationGetting Up to Speed with PROC REPORT Kimberly LeBouton, K.J.L. Computing, Rossmoor, CA
SESUG 2012 Paper HW-01 Getting Up to Speed with PROC REPORT Kimberly LeBouton, K.J.L. Computing, Rossmoor, CA ABSTRACT Learning the basics of PROC REPORT can help the new SAS user avoid hours of headaches.
More informationSAS/STAT 13.2 User s Guide. The SURVEYLOGISTIC Procedure
SAS/STAT 13.2 User s Guide The SURVEYLOGISTIC Procedure This document is an individual chapter from SAS/STAT 13.2 User s Guide. The correct bibliographic citation for the complete manual is as follows:
More informationMean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242
Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types
More informationCHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT
CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT This chapter provides step by step instructions on how to define and estimate each of the three types of LC models (Cluster, DFactor or Regression) and also
More informationHierarchical Generalized Linear Models
Generalized Multilevel Linear Models Introduction to Multilevel Models Workshop University of Georgia: Institute for Interdisciplinary Research in Education and Human Development 07 Generalized Multilevel
More informationSAS/STAT 14.2 User s Guide. The SURVEYIMPUTE Procedure
SAS/STAT 14.2 User s Guide The SURVEYIMPUTE Procedure This document is an individual chapter from SAS/STAT 14.2 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute
More informationDr. Barbara Morgan Quantitative Methods
Dr. Barbara Morgan Quantitative Methods 195.650 Basic Stata This is a brief guide to using the most basic operations in Stata. Stata also has an on-line tutorial. At the initial prompt type tutorial. In
More informationInternational data products
International data products Public use files... 376 Codebooks for the PISA 2015 public use data files... 377 Data compendia tables... 378 Data analysis and software tools... 378 International Database
More informationFurther Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables
Further Maths Notes Common Mistakes Read the bold words in the exam! Always check data entry Remember to interpret data with the multipliers specified (e.g. in thousands) Write equations in terms of variables
More informationPreparing for Data Analysis
Preparing for Data Analysis Prof. Andrew Stokes March 21, 2017 Managing your data Entering the data into a database Reading the data into a statistical computing package Checking the data for errors and
More informationAdjusting for Unequal Selection Probability in Multilevel Models: A Comparison of Software Packages
Adusting for Unequal Selection Probability in Multilevel Models: A Comparison of Software Packages Kim Chantala Chirayath Suchindran Carolina Population Center, UNC at Chapel Hill Carolina Population Center,
More informationExcel 2010 with XLSTAT
Excel 2010 with XLSTAT J E N N I F E R LE W I S PR I E S T L E Y, PH.D. Introduction to Excel 2010 with XLSTAT The layout for Excel 2010 is slightly different from the layout for Excel 2007. However, with
More informationPsychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding
Psychology 282 Lecture #21 Outline Categorical IVs in MLR: Effects Coding and Contrast Coding In the previous lecture we learned how to incorporate a categorical research factor into a MLR model by using
More informationGeneralized least squares (GLS) estimates of the level-2 coefficients,
Contents 1 Conceptual and Statistical Background for Two-Level Models...7 1.1 The general two-level model... 7 1.1.1 Level-1 model... 8 1.1.2 Level-2 model... 8 1.2 Parameter estimation... 9 1.3 Empirical
More informationMath 120 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency
Math 1 Introduction to Statistics Mr. Toner s Lecture Notes 3.1 Measures of Central Tendency lowest value + highest value midrange The word average: is very ambiguous and can actually refer to the mean,
More informationIQR = number. summary: largest. = 2. Upper half: Q3 =
Step by step box plot Height in centimeters of players on the 003 Women s Worldd Cup soccer team. 157 1611 163 163 164 165 165 165 168 168 168 170 170 170 171 173 173 175 180 180 Determine the 5 number
More informationStatistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment
Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ Aiming Yang, Merck & Co., Inc., Rahway, NJ ABSTRACT Four pitfalls are commonly
More informationJMP 10 Student Edition Quick Guide
JMP 10 Student Edition Quick Guide Instructions presume an open data table, default preference settings and appropriately typed, user-specified variables of interest. RMC = Click Right Mouse Button Graphing
More informationData Management - 50%
Exam 1: SAS Big Data Preparation, Statistics, and Visual Exploration Data Management - 50% Navigate within the Data Management Studio Interface Register a new QKB Create and connect to a repository Define
More informationPlease login. Take a seat Login with your HawkID Locate SAS 9.3. Raise your hand if you need assistance. Start / All Programs / SAS / SAS 9.
Please login Take a seat Login with your HawkID Locate SAS 9.3 Start / All Programs / SAS / SAS 9.3 (64 bit) Raise your hand if you need assistance Introduction to SAS Procedures Sarah Bell Overview Review
More informationSTATA 13 INTRODUCTION
STATA 13 INTRODUCTION Catherine McGowan & Elaine Williamson LONDON SCHOOL OF HYGIENE & TROPICAL MEDICINE DECEMBER 2013 0 CONTENTS INTRODUCTION... 1 Versions of STATA... 1 OPENING STATA... 1 THE STATA
More information4. Descriptive Statistics: Measures of Variability and Central Tendency
4. Descriptive Statistics: Measures of Variability and Central Tendency Objectives Calculate descriptive for continuous and categorical data Edit output tables Although measures of central tendency and
More informationSAS Structural Equation Modeling 1.3 for JMP
SAS Structural Equation Modeling 1.3 for JMP SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2012. SAS Structural Equation Modeling 1.3 for JMP. Cary,
More informationIntroduction to Mixed Models: Multivariate Regression
Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate
More informationQuick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018
Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018 Contents Introduction... 1 Start DIONE... 2 Load Data... 3 Missing Values... 5 Explore Data... 6 One Variable... 6 Two Variables... 7 All
More informationSAS Enterprise Miner : Tutorials and Examples
SAS Enterprise Miner : Tutorials and Examples SAS Documentation February 13, 2018 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2017. SAS Enterprise Miner : Tutorials
More informationPrepare a stem-and-leaf graph for the following data. In your final display, you should arrange the leaves for each stem in increasing order.
Chapter 2 2.1 Descriptive Statistics A stem-and-leaf graph, also called a stemplot, allows for a nice overview of quantitative data without losing information on individual observations. It can be a good
More informationANNOUNCING THE RELEASE OF LISREL VERSION BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3
ANNOUNCING THE RELEASE OF LISREL VERSION 9.1 2 BACKGROUND 2 COMBINING LISREL AND PRELIS FUNCTIONALITY 2 FIML FOR ORDINAL AND CONTINUOUS VARIABLES 3 THREE-LEVEL MULTILEVEL GENERALIZED LINEAR MODELS 3 FOUR
More informationBrief Guide on Using SPSS 10.0
Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new
More informationEnterprise Miner Tutorial Notes 2 1
Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender
More informationChapter 15 Mixed Models. Chapter Table of Contents. Introduction Split Plot Experiment Clustered Data References...
Chapter 15 Mixed Models Chapter Table of Contents Introduction...309 Split Plot Experiment...311 Clustered Data...320 References...326 308 Chapter 15. Mixed Models Chapter 15 Mixed Models Introduction
More information2.1 Objectives. Math Chapter 2. Chapter 2. Variable. Categorical Variable EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 Objectives 2.1 What Are the Types of Data? www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative
More informationMultiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health
Multiple Imputation for Missing Data Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Outline Missing data mechanisms What is Multiple Imputation? Software Options
More informationJMP Book Descriptions
JMP Book Descriptions The collection of JMP documentation is available in the JMP Help > Books menu. This document describes each title to help you decide which book to explore. Each book title is linked
More informationECLT 5810 Data Preprocessing. Prof. Wai Lam
ECLT 5810 Data Preprocessing Prof. Wai Lam Why Data Preprocessing? Data in the real world is imperfect incomplete: lacking attribute values, lacking certain attributes of interest, or containing only aggregate
More informationStatistical Analysis Using SPSS for Windows Getting Started (Ver. 2018/10/30) The numbers of figures in the SPSS_screenshot.pptx are shown in red.
Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2018/10/30) The numbers of figures in the SPSS_screenshot.pptx are shown in red. 1. How to display English messages from IBM SPSS Statistics
More informationData-Analysis Exercise Fitting and Extending the Discrete-Time Survival Analysis Model (ALDA, Chapters 11 & 12, pp )
Applied Longitudinal Data Analysis Page 1 Data-Analysis Exercise Fitting and Extending the Discrete-Time Survival Analysis Model (ALDA, Chapters 11 & 12, pp. 357-467) Purpose of the Exercise This data-analytic
More informationPreprocessing Short Lecture Notes cse352. Professor Anita Wasilewska
Preprocessing Short Lecture Notes cse352 Professor Anita Wasilewska Data Preprocessing Why preprocess the data? Data cleaning Data integration and transformation Data reduction Discretization and concept
More informationSAS/STAT 13.1 User s Guide. The NESTED Procedure
SAS/STAT 13.1 User s Guide The NESTED Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute
More informationWELCOME! Lecture 3 Thommy Perlinger
Quantitative Methods II WELCOME! Lecture 3 Thommy Perlinger Program Lecture 3 Cleaning and transforming data Graphical examination of the data Missing Values Graphical examination of the data It is important
More informationTable of Contents (As covered from textbook)
Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression
More informationInformation Criteria Methods in SAS for Multiple Linear Regression Models
Paper SA5 Information Criteria Methods in SAS for Multiple Linear Regression Models Dennis J. Beal, Science Applications International Corporation, Oak Ridge, TN ABSTRACT SAS 9.1 calculates Akaike s Information
More informationData Quality Control: Using High Performance Binning to Prevent Information Loss
Paper 2821-2018 Data Quality Control: Using High Performance Binning to Prevent Information Loss Deanna Naomi Schreiber-Gregory, Henry M Jackson Foundation ABSTRACT It is a well-known fact that the structure
More informationChapter 1 Changes and Enhancements to SAS/STAT Software in Versions 7 and 8
Chapter 1 Changes and Enhancements to SAS/STAT Software in Versions 7 and 8 Overview This chapter summarizes the major changes and enhancements to SAS/STAT software in Versions 7 and 8. All of these changes
More information8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10
8: Statistics Statistics: Method of collecting, organizing, analyzing, and interpreting data, as well as drawing conclusions based on the data. Methodology is divided into two main areas. Descriptive Statistics:
More informationIt s Proc Tabulate Jim, but not as we know it!
Paper SS02 It s Proc Tabulate Jim, but not as we know it! Robert Walls, PPD, Bellshill, UK ABSTRACT PROC TABULATE has received a very bad press in the last few years. Most SAS Users have come to look on
More informationIntroduction to Nesstar
Introduction to Nesstar Nesstar is a software system for online data analysis. It is available for use with many of the large UK surveys on the UK Data Service website. You will know whether you can use
More informationMPLUS Analysis Examples Replication Chapter 10
MPLUS Analysis Examples Replication Chapter 10 Mplus includes all input code and output in the *.out file. This document contains selected output from each analysis for Chapter 10. All data preparation
More informationTexting distracted driving behaviour among European drivers: influence of attitudes, subjective norms and risk perception
Texting distracted driving behaviour among European drivers: influence of attitudes, subjective norms and risk perception Alain Areal Authors: Carlos Pires Prevenção Rodoviária Portuguesa, Lisboa, Portugal
More informationAND NUMERICAL SUMMARIES. Chapter 2
EXPLORING DATA WITH GRAPHS AND NUMERICAL SUMMARIES Chapter 2 2.1 What Are the Types of Data? 2.1 Objectives www.managementscientist.org 1. Know the definitions of a. Variable b. Categorical versus quantitative
More informationUsing PROC PLAN for Randomization Assignments
Using PROC PLAN for Randomization Assignments Miriam W. Rosenblatt Division of General Internal Medicine and Health Care Research, University. Hospitals of Cleveland Abstract This tutorial is an introduction
More informationSquare Peg, Square Hole Getting Tables to Fit on Slides in the ODS Destination for PowerPoint
PharmaSUG 2018 - Paper DV-01 Square Peg, Square Hole Getting Tables to Fit on Slides in the ODS Destination for PowerPoint Jane Eslinger, SAS Institute Inc. ABSTRACT An output table is a square. A slide
More informationThe SURVEYSELECT Procedure
SAS/STAT 9.2 User s Guide The SURVEYSELECT Procedure (Book Excerpt) SAS Documentation This document is an individual chapter from SAS/STAT 9.2 User s Guide. The correct bibliographic citation for the complete
More informationApplied Survey Data Analysis Module 2: Variance Estimation March 30, 2013
Applied Statistics Lab Applied Survey Data Analysis Module 2: Variance Estimation March 30, 2013 Approaches to Complex Sample Variance Estimation In simple random samples many estimators are linear estimators
More informationThe NESTED Procedure (Chapter)
SAS/STAT 9.3 User s Guide The NESTED Procedure (Chapter) SAS Documentation This document is an individual chapter from SAS/STAT 9.3 User s Guide. The correct bibliographic citation for the complete manual
More informationChapter Two: Descriptive Methods 1/50
Chapter Two: Descriptive Methods 1/50 2.1 Introduction 2/50 2.1 Introduction We previously said that descriptive statistics is made up of various techniques used to summarize the information contained
More informationABSTRACT INTRODUCTION PROBLEM: TOO MUCH INFORMATION? math nrt scr. ID School Grade Gender Ethnicity read nrt scr
ABSTRACT A strategy for understanding your data: Binary Flags and PROC MEANS Glen Masuda, SRI International, Menlo Park, CA Tejaswini Tiruke, SRI International, Menlo Park, CA Many times projects have
More information1 Introduction. 1.1 What is Statistics?
1 Introduction 1.1 What is Statistics? MATH1015 Biostatistics Week 1 Statistics is a scientific study of numerical data based on natural phenomena. It is also the science of collecting, organising, interpreting
More informationEnterprise Miner Software: Changes and Enhancements, Release 4.1
Enterprise Miner Software: Changes and Enhancements, Release 4.1 The correct bibliographic citation for this manual is as follows: SAS Institute Inc., Enterprise Miner TM Software: Changes and Enhancements,
More informationChapter 13 Multivariate Techniques. Chapter Table of Contents
Chapter 13 Multivariate Techniques Chapter Table of Contents Introduction...279 Principal Components Analysis...280 Canonical Correlation...289 References...298 278 Chapter 13. Multivariate Techniques
More information