SAS/STAT 13.1 User s Guide. The SURVEYFREQ Procedure

Similar documents
SAS/STAT 14.3 User s Guide The SURVEYFREQ Procedure

SAS/STAT 14.2 User s Guide. The SURVEYIMPUTE Procedure

SAS/STAT 13.1 User s Guide. The NESTED Procedure

SAS/STAT 14.1 User s Guide. The SURVEYREG Procedure

SAS/STAT 14.2 User s Guide. The SURVEYREG Procedure

The SURVEYREG Procedure

SAS/STAT 13.2 User s Guide. The SURVEYLOGISTIC Procedure

The NESTED Procedure (Chapter)

SAS/STAT 13.1 User s Guide. The Power and Sample Size Application

SAS/STAT 13.1 User s Guide. The SURVEYSELECT Procedure

SAS/STAT 14.3 User s Guide The SURVEYSELECT Procedure

SAS Workflow Manager 2.2: Administrator s Guide

Licensing SAS DataFlux Products

The SURVEYSELECT Procedure

SAS/ETS 13.2 User s Guide. The TIMEID Procedure

SAS Universal Viewer 1.3

SAS Enterprise Miner : Tutorials and Examples

SAS Infrastructure for Risk Management 3.4: User s Guide

Getting Started with SAS Factory Miner 14.2

SAS Cloud Analytic Services 3.1: Graphing Your Output

SAS Simulation Studio 14.1: User s Guide. Introduction to SAS Simulation Studio

SAS Clinical Data Integration 2.6

SAS Factory Miner 14.2: User s Guide

SAS/STAT 13.1 User s Guide. The SCORE Procedure

3.6 Sample code: yrbs_data <- read.spss("yrbs07.sav",to.data.frame=true)

Correctly Compute Complex Samples Statistics

Poisson Regressions for Complex Surveys

SAS Structural Equation Modeling 1.3 for JMP

SAS/ETS 13.2 User s Guide. The COMPUTAB Procedure

SAS Environment Manager 2.1

SAS Business Rules Manager 2.1

SAS Marketing Operations Management 6.0 R14 Update 2

SAS Contextual Analysis 13.2: Administrator s Guide

Analysis of Complex Survey Data with SAS

SAS Visual Analytics 7.2, 7.3, and 7.4: Getting Started with Analytical Models

SAS Business Rules Manager 1.2

The STANDARD Procedure

Predictive Modeling with SAS Enterprise Miner

SAS University Edition: Installation Guide for Windows

SAS IT Resource Management 3.3

SAS 9.4 Foundation Services: Administrator s Guide

Choosing the Right Procedure

Scheduling in SAS 9.4, Second Edition

SAS Clinical Data Integration 2.4

Chapter 28 Saving and Printing Tables. Chapter Table of Contents SAVING AND PRINTING TABLES AS OUTPUT OBJECTS OUTPUT OBJECTS...

SAS Contextual Analysis 14.3: Administrator s Guide

Chapter 13 Multivariate Techniques. Chapter Table of Contents

Data Representation. Variable Precision and Storage Information. Numeric Variables in the Alpha Environment CHAPTER 9

SAS Enterprise Case Management 6.3. Data Dictionary

SAS University Edition: OS X

SAS University Edition: Installation Guide for Linux

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

SAS IT Resource Management 3.8: Reporting Guide

Correctly Compute Complex Samples Statistics

SAS/STAT 14.2 User s Guide. The SIMNORMAL Procedure

SAS Graphics Accelerator: User s Guide

SAS Web Report Studio 3.1

DataFlux Web Studio 2.5. Installation and Configuration Guide

The G4GRID Procedure. Introduction APPENDIX 1

SAS Add-In 7.1 for Microsoft Office: Getting Started in Microsoft Excel, Microsoft Word, and Microsoft PowerPoint, Second Edition

Formats. Formats Under UNIX. HEXw. format. $HEXw. format. Details CHAPTER 11

SAS. OnDemand for Academics: User s Guide. SAS Documentation

DataFlux Migration Guide 2.7

HILDA PROJECT TECHNICAL PAPER SERIES No. 2/08, February 2008

The TIMEPLOT Procedure

SAS Federation Server 4.2: Migration Guide

Introduction. LOCK Statement. CHAPTER 11 The LOCK Statement and the LOCK Command

Applied Survey Data Analysis Module 2: Variance Estimation March 30, 2013

Chapter 6 Creating Reports. Chapter Table of Contents

SAS Theme Designer 4.7 for Flex

Using Data Transfer Services

SAS/STAT 13.2 User s Guide. The VARCLUS Procedure

APPENDIX 4 Migrating from QMF to SAS/ ASSIST Software. Each of these steps can be executed independently.

Clinical Standards Toolkit 1.7

Two-Machine Deployment of SAS Office Analytics 7.4

SAS/QC 13.2 User s Guide. The FACTEX Procedure

Installation Instructions for SAS 9.4 Installation Kit for Basic Cartridge Installations on z /OS

SAS Studio 3.7: Writing Your First Custom Task

Chapter 15 Mixed Models. Chapter Table of Contents. Introduction Split Plot Experiment Clustered Data References...

SAS 9.4 Data Quality Accelerator for Teradata: User s Guide

Time Series Studio 13.1

mai Installation Instructions for SAS 9.4 Electronic Software Delivery for Basic Installations on z /OS

SAS Data Loader 2.4 for Hadoop

DBLOAD Procedure Reference

Inventory Optimization Workbench 5.2

SAS Energy Forecasting 3.1 Installation Guide

Chapter 25 Editing Windows. Chapter Table of Contents

Chapter 25 PROC PARETO Statement. Chapter Table of Contents. OVERVIEW SYNTAX SummaryofOptions DictionaryofOptions...

SAS IT Resource Management 3.3

SAS. Contextual Analysis 13.2: User s Guide. SAS Documentation

SAS Inventory Optimization 5.1

SAS Cloud Analytic Services 3.2: Accessing and Manipulating Data

Data Set Options. Specify a data set option in parentheses after a SAS data set name. To specify several data set options, separate them with spaces.

CREATING THE ANALYSIS

JMP Clinical. Release Notes. Version 5.0

CREATING THE DISTRIBUTION ANALYSIS

SAS Publishing. Configure SAS. Forecast Server 1.4. Stored Processes

The correct bibliographic citation for this manual is as follows: SAS Institute Inc Proc EXPLODE. Cary, NC: SAS Institute Inc.

Tasks Menu Reference. Introduction. Data Management APPENDIX 1

SAS Visual Analytics 7.3: Installation and Configuration Guide (Distributed SAS LASR )

Transcription:

SAS/STAT 13.1 User s Guide The SURVEYFREQ Procedure

This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute Inc. 2013. SAS/STAT 13.1 User s Guide. Cary, NC: SAS Institute Inc. Copyright 2013, SAS Institute Inc., Cary, NC, USA All rights reserved. Produced in the United States of America. For a hard-copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc. For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication. The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others rights is appreciated. U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computer software developed at private expense and is provided with RESTRICTED RIGHTS to the United States Government. Use, duplication or disclosure of the Software by the United States Government is subject to the license terms of this Agreement pursuant to, as applicable, FAR 12.212, DFAR 227.7202-1(a), DFAR 227.7202-3(a) and DFAR 227.7202-4 and, to the extent required under U.S. federal law, the minimum restricted rights as set out in FAR 52.227-19 (DEC 2007). If FAR 52.227-19 is applicable, this provision serves as notice under clause (c) thereof and no other notice is required to be affixed to the Software or documentation. The Government s rights in Software and documentation shall be only those set forth in this Agreement. SAS Institute Inc., SAS Campus Drive, Cary, North Carolina 27513-2414. December 2013 SAS provides a complete selection of books and electronic products to help customers use SAS software to its fullest potential. For more information about our offerings, visit support.sas.com/bookstore or call 1-800-727-3228. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies.

Gain Greater Insight into Your SAS Software with SAS Books. Discover all that you need on your journey to knowledge and empowerment. support.sas.com/bookstore for additional books and resources. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 2013 SAS Institute Inc. All rights reserved. S107969US.0613

Chapter 94 The SURVEYFREQ Procedure Contents Overview: SURVEYFREQ Procedure............................. 7990 Getting Started: SURVEYFREQ Procedure.......................... 7990 Syntax: SURVEYFREQ Procedure.............................. 7999 PROC SURVEYFREQ Statement............................ 8000 BY Statement...................................... 8007 CLUSTER Statement.................................. 8008 REPWEIGHTS Statement................................ 8008 STRATA Statement................................... 8010 TABLES Statement................................... 8010 WEIGHT Statement................................... 8030 Details: SURVEYFREQ Procedure.............................. 8031 Specifying the Sample Design.............................. 8031 Domain Analysis..................................... 8033 Missing Values...................................... 8033 Statistical Computations................................. 8036 Variance Estimation............................... 8036 Definitions and Notation............................ 8037 Totals...................................... 8038 Covariance of Totals.............................. 8040 Proportions................................... 8040 Row and Column Proportions......................... 8042 Balanced Repeated Replication (BRR)..................... 8043 The Jackknife Method............................. 8046 Confidence Limits for Totals.......................... 8048 Confidence Limits for Proportions....................... 8048 Degrees of Freedom.............................. 8051 Coefficient of Variation............................. 8052 Design Effect.................................. 8052 Expected Weighted Frequency......................... 8053 Risks and Risk Difference........................... 8054 Odds Ratio and Relative Risks......................... 8055 Kappa Coefficients............................... 8057 Rao-Scott Chi-Square Test........................... 8060 Rao-Scott Likelihood Ratio Chi-Square Test.................. 8065 Wald Chi-Square Test.............................. 8067 Wald Log-Linear Chi-Square Test....................... 8068

7990 Chapter 94: The SURVEYFREQ Procedure Output Data Sets..................................... 8069 Displayed Output..................................... 8070 ODS Table Names.................................... 8077 ODS Graphics...................................... 8078 Examples: SURVEYFREQ Procedure............................. 8078 Example 94.1: Two-Way Tables............................. 8078 Example 94.2: Multiway Tables (Domain Analysis).................. 8082 Example 94.3: Output Data Sets............................ 8084 References........................................... 8085 Overview: SURVEYFREQ Procedure The SURVEYFREQ procedure produces one-way to n-way frequency and crosstabulation tables from sample survey data. These tables include estimates of population totals, population proportions, and their standard errors. Confidence limits, coefficients of variation, and design effects are also available. The procedure provides a variety of options to customize the table display. For one-way frequency tables, PROC SURVEYFREQ provides Rao-Scott chi-square goodness-of-fit tests, which are adjusted for the sample design. You can test a null hypothesis of equal proportions for a one-way frequency table, or you can input custom nu5ll hypothesis proportions for the test. For two-way tables, PROC SURVEYFREQ provides design-adjusted tests of independence, or no association, between the row and column variables. These tests include the Rao-Scott chi-square test, the Rao-Scott likelihood ratio test, the Wald chi-square test, and the Wald log-linear chi-square test. For 2 2 tables, PROC SURVEYFREQ computes estimates and confidence limits for risks (row proportions), the risk difference, the odds ratio, and relative risks. PROC SURVEYFREQ computes variance estimates based on the sample design used to obtain the survey data. The design can be a complex multistage survey design with stratification, clustering, and unequal weighting. PROC SURVEYFREQ provides a choice of variance estimation methods, which include Taylor series linearization, balanced repeated replication (BRR), and the jackknife. PROC SURVEYFREQ uses ODS Graphics to create graphs as part of its output. For general information about ODS Graphics, see Chapter 21, Statistical Graphics Using ODS. For specific information about the statistical graphics available with the SURVEYFREQ procedure, see the PLOTS= option in the TABLES statement and the section ODS Graphics on page 8078. Getting Started: SURVEYFREQ Procedure The following example shows how you can use PROC SURVEYFREQ to analyze sample survey data. The example uses data from a customer satisfaction survey for a student information system (SIS), which is a software product that provides modules for student registration, class scheduling, attendance, grade reporting, and other functions.

Getting Started: SURVEYFREQ Procedure 7991 The software company conducted a survey of school personnel who use the SIS. A probability sample of SIS users was selected from the study population, which included SIS users at middle schools and high schools in the three-state area of Georgia, South Carolina, and North Carolina. The sample design for this survey was a two-stage stratified design. A first-stage sample of schools was selected from the list of schools in the three-state area that use the SIS. The list of schools (the first-stage sampling frame) was stratified by state and by customer status (whether the school was a new user of the system or a renewal user). Within the first-stage strata, schools were selected with probability proportional to size and with replacement, where the size measure was school enrollment. From each sample school, five staff members were randomly selected to complete the SIS satisfaction questionnaire. These staff members included three teachers and two administrators or guidance department members. The SAS data set SIS_Survey contains the survey results, as well as the sample design information needed to analyze the data. This data set includes an observation for each school staff member responding to the survey. The variable Response contains the staff member s response about overall satisfaction with the system. The variable State contains the school s state, and the variable NewUser contains the school s customer status ( New Customer or Renewal Customer ). These two variables determine the first-stage strata from which schools were selected. The variable School contains the school identification code and identifies the first-stage sampling units (clusters). The variable SamplingWeight contains the overall sampling weight for each respondent. Overall sampling weights were computed from the selection probabilities at each stage of sampling and were adjusted for nonresponse. Other variables in the data set SIS_Survey include SchoolType and Department. The variable SchoolType identifies the school as a high school or a middle school. The variable Department identifies the staff member as a teacher, or an administrator or guidance department member. The following PROC SURVEYFREQ statements request a one-way frequency table for the variable Response: title 'Student Information System Survey'; proc surveyfreq data=sis_survey; tables Response; strata State NewUser; cluster School; weight SamplingWeight; run; The PROC SURVEYFREQ statement invokes the procedure and identifies the input data set to be analyzed. The TABLES statement requests a one-way frequency table for the variable Response. The table request syntax for PROC SURVEYFREQ is very similar to the table request syntax for PROC FREQ. This example shows a request for a single one-way table, but you can also request two-way tables and multiway tables. As in PROC FREQ, you can request more than one table in the same TABLES statement, and you can use multiple TABLES statements in the same invocation of the procedure. The STRATA, CLUSTER, and WEIGHT statements provide sample design information for the procedure, so that the analysis is done according to the sample design used for the survey, and the estimates apply to the study population. The STRATA statement names the variables State and NewUser, which identify the first-stage strata. Note that the design for this example also includes stratification at the second stage of selection (by type of school personnel), but you specify only the first-stage strata for PROC SURVEYFREQ. The CLUSTER statement names the variable School, which identifies the clusters (primary sampling units). The WEIGHT statement names the sampling weight variable.

7992 Chapter 94: The SURVEYFREQ Procedure Figure 94.1 and Figure 94.2 display the output produced by PROC SURVEYFREQ, which includes the Data Summary table and the one-way table, Table of Response. The Data Summary table is produced by default unless you specify the NOSUMMARY option. This table shows there are 6 strata, 370 clusters or schools, and 1850 observations (respondents) in the SIS_Survey data set. The sum of the sampling weights is approximately 39,000, which estimates the total number of school personnel in the study area that use the SIS. Figure 94.1 SIS_Survey Data Summary Student Information System Survey The SURVEYFREQ Procedure Data Summary Number of Strata 6 Number of Clusters 370 Number of Observations 1850 Sum of Weights 38899.6482 Figure 94.2 displays the one-way table of Response, which provides estimates of the population total (weighted frequency) and the population percentage for each category (level) of the variable Response. The response level Very Unsatisfied has a frequency of 304, which means that 304 sample respondents fall into this category. It is estimated that 17.17% of all school personnel in the study population fall into this category, and the standard error of this estimate is 1.29%. Note that the estimates apply to the population of all SIS users in the study area, as opposed to describing only the sample of 1850 respondents. The estimate of the total number of school personnel that are Very Unsatisfied is 6,678, with a standard deviation of 502. The standard errors computed by PROC SURVEYFREQ are based on the multistage stratified design of the survey. This differs from some of the traditional analysis procedures, which assume the design is simple random sampling from an infinite population. Figure 94.2 One-Way Table of Response Table of Response Weighted Std Dev of Std Err of Response Frequency Frequency Wgt Freq Percent Percent ------------------------------------------------------------------------------ Very Unsatisfied 304 6678 501.61039 17.1676 1.2872 Unsatisfied 326 6907 495.94101 17.7564 1.2712 Neutral 581 12291 617.20147 31.5965 1.5795 Satisfied 455 9309 572.27868 23.9311 1.4761 Very Satisfied 184 3714 370.66577 9.5483 0.9523 Total 1850 38900 129.85268 100.000 ------------------------------------------------------------------------------

Getting Started: SURVEYFREQ Procedure 7993 The following PROC SURVEYFREQ statements request confidence limits for the percentages, a chi-square goodness-of-fit test, and a weighted frequency plot for the one-way table of Response. The ODS GRAPHICS ON statement enables ODS Graphics. title 'Student Information System Survey'; ods graphics on; proc surveyfreq data=sis_survey nosummary; tables Response / clwt nopct chisq plots=wtfreqplot; strata State NewUser; cluster School; weight SamplingWeight; run; ods graphics off; The NOSUMMARY option in the PROC SURVEYFREQ statement suppresses the Data Summary table. In the TABLES statement, the CLWT option requests confidence limits for the weighted frequencies (totals). The NOPCT option suppresses display of the weighted frequencies and their standard deviations. The CHISQ option requests a Rao-Scott chi-square goodness-of-fit test, and the PLOTS= option requests a weighted frequency plot. ODS Graphics must be enabled before producing plots. Figure 94.3 shows the one-way table of Response, which includes confidence limits for the weighted frequencies. The 95% confidence limits for the total number of users that are Very Unsatisfied are 5692 and 7665. To change the level of the confidence limits, which equals 5% by default, you can use the ALPHA= option. Like the other estimates and standard errors produced by PROC SURVEYFREQ, these confidence limit computations take into account the complex survey design and apply to the entire study population. Figure 94.3 Confidence Limits for Response Totals Student Information System Survey The SURVEYFREQ Procedure Table of Response Weighted Std Dev of 95% Confidence Limits Response Frequency Frequency Wgt Freq for Wgt Freq ----------------------------------------------------------------------------- Very Unsatisfied 304 6678 501.61039 5692 7665 Unsatisfied 326 6907 495.94101 5932 7882 Neutral 581 12291 617.20147 11077 13505 Satisfied 455 9309 572.27868 8184 10435 Very Satisfied 184 3714 370.66577 2985 4443 Total 1850 38900 129.85268 38644 39155 -----------------------------------------------------------------------------

7994 Chapter 94: The SURVEYFREQ Procedure Figure 94.4 displays the weighted frequency plot of Response. The plot displays weighted frequencies (totals) together with their confidence limits in the form of a vertical bar chart. You can use the PLOTS= option to request a dot plot instead of a bar chart or to plot percentages instead of weighted frequencies. Figure 94.4 Bar Chart of Response Totals Figure 94.5 shows the chi-square goodness-of-fit results for the table of Response. The null hypothesis for this test is equal proportions for the levels of the one-way table. (To test a null hypothesis of specified proportions instead of equal proportions, you can use the TESTP= option to specify null hypothesis proportions.) The chi-square test provided by the CHISQ option is the Rao-Scott design-adjusted chi-square test, which takes the sample design into account and provides inferences for the study population. To produce the Rao-Scott chi-square statistic, PROC SURVEYFREQ first computes the usual Pearson chi-square statistic based on the weighted frequencies, and then adjusts this value with a design correction. An F approximation is also provided. For the table of Response, the F value is 30.0972 with a p-value of <0.0001, which indicates rejection of the null hypothesis of equal proportions for all response levels.

Getting Started: SURVEYFREQ Procedure 7995 Figure 94.5 Chi-Square Goodness-of-Fit Test for Response Rao-Scott Chi-Square Test Pearson Chi-Square 251.8105 Design Correction 2.0916 Rao-Scott Chi-Square 120.3889 DF 4 Pr > ChiSq <.0001 F Value 30.0972 Num DF 4 Den DF 1456 Pr > F <.0001 Sample Size = 1850 Continuing to analyze the SIS_Survey data, the following PROC SURVEYFREQ statements request a two-way table of SchoolType by Response: title 'Student Information System Survey'; ods graphics on; proc surveyfreq data=sis_survey nosummary; tables SchoolType * Response / plots=wtfreqplot(type=dot scale=percent groupby=row); strata State NewUser; cluster School; weight SamplingWeight; run; ods graphics off; The STRATA, CLUSTER, and WEIGHT statements do not change from the one-way table analysis, because the sample design and the input data set are the same. These SURVEYFREQ statements request a different table but specify the same sample design information. The ODS GRAPHICS ON statement enables ODS Graphics. The PLOTS= option in the TABLES statement requests a plot of SchoolType by Response, and the TYPE=DOT plot-option specifies a dot plot instead of the default bar chart. The SCALE=PERCENT plot-option requests a plot of percentages instead of totals. The GROUPBY=ROW plot-option groups the graph cells by the row variable (SchoolType). Figure 94.6 shows the two-way table produced for SchoolType by Response. The first variable named in the two-way table request, SchoolType, is referred to as the row variable, and the second variable, Response, is referred to as the column variable. Two-way tables display all column variable levels for each row variable level. This two-way table lists all levels of the column variable Response for each level of the row variable SchoolType, Middle School and High School. Also SchoolType = Total shows the distribution of Response overall for both types of schools. And Response = Total provides totals over all levels of response, for each type of school and overall. To suppress these totals, you can specify the NOTOTAL option.

7996 Chapter 94: The SURVEYFREQ Procedure Figure 94.6 Two-Way Table of SchoolType by Response Student Information System Survey The SURVEYFREQ Procedure Table of SchoolType by Response Weighted Std Dev of Std Err of SchoolType Response Frequency Frequency Wgt Freq Percent Percent ---------------------------------------------------------------------------------------------------- Middle School Very Unsatisfied 116 2496 351.43834 6.4155 0.9030 Unsatisfied 109 2389 321.97957 6.1427 0.8283 Neutral 234 4856 504.20553 12.4847 1.2953 Satisfied 197 4064 443.71188 10.4467 1.1417 Very Satisfied 94 1952 302.17144 5.0193 0.7758 Total 750 15758 1000 40.5089 2.5691 ---------------------------------------------------------------------------------------------------- High School Very Unsatisfied 188 4183 431.30589 10.7521 1.1076 Unsatisfied 217 4518 446.31768 11.6137 1.1439 Neutral 347 7434 574.17175 19.1119 1.4726 Satisfied 258 5245 498.03221 13.4845 1.2823 Very Satisfied 90 1762 255.67158 4.5290 0.6579 Total 1100 23142 1003 59.4911 2.5691 ---------------------------------------------------------------------------------------------------- Total Very Unsatisfied 304 6678 501.61039 17.1676 1.2872 Unsatisfied 326 6907 495.94101 17.7564 1.2712 Neutral 581 12291 617.20147 31.5965 1.5795 Satisfied 455 9309 572.27868 23.9311 1.4761 Very Satisfied 184 3714 370.66577 9.5483 0.9523 Total 1850 38900 129.85268 100.000 ---------------------------------------------------------------------------------------------------- Figure 94.7 displays the weighted frequency dot plot that PROC SURVEYFREQ produces for the table of SchoolType and Response. The GROUPBY=ROW plot-option groups the graph cells by the row variable (SchoolType). If you do not specify GROUPBY=ROW, the procedure groups the graph cells by the column variable by default. You can plot percentages instead of weighted frequencies by specifying the SCALE=PERCENT plot-option. You can use other plot-options to change the orientation of the plot or to request a different two-way layout.

Getting Started: SURVEYFREQ Procedure 7997 Figure 94.7 Dot Plot of Percentages for SchoolType by Response By default, without any other TABLES statement options, a two-way table displays the frequency, the weighted frequency and its standard deviation, and the percentage and its standard error for each table cell (combination of row and column variable levels). But there are several options available to customize your table display by adding more information or by suppressing some of the default information. The following PROC SURVEYFREQ statements request a two-way table of SchoolType by Response that displays row percentages, and also request a chi-square test of association between the two variables: title 'Student Information System Survey'; proc surveyfreq data=sis_survey nosummary; tables SchoolType * Response / row nowt chisq; strata State NewUser; cluster School; weight SamplingWeight; run; The ROW option in the TABLES statement requests row percentages, which give the distribution of Response within each level of the row variable SchoolType. The NOWT option suppresses display of the weighted frequencies and their standard deviations. The CHISQ option requests a Rao-Scott chi-square test of association between SchoolType and Response.

7998 Chapter 94: The SURVEYFREQ Procedure Figure 94.8 displays the two-way table of SchoolType by Response. For middle schools, it is estimated that 25.79% of school personnel are satisfied with the student information system and 12.39% are very satisfied. For high schools, these estimates are 22.67% and 7.61%, respectively. Figure 94.9 displays the chi-square test results. The Rao-Scott chi-square statistic equals 9.04, and the corresponding F value is 2.26 with a p-value of 0.0605. This indicates an association between school type (middle school or high school) and satisfaction with the student information system at the 10% significance level. Figure 94.8 Two-Way Table with Row Percentages Student Information System Survey The SURVEYFREQ Procedure Table of SchoolType by Response Std Err of Row Std Err of SchoolType Response Frequency Percent Percent Percent Row Percent -------------------------------------------------------------------------------------------------- Middle School Very Unsatisfied 116 6.4155 0.9030 15.8373 1.9920 Unsatisfied 109 6.1427 0.8283 15.1638 1.8140 Neutral 234 12.4847 1.2953 30.8196 2.5173 Satisfied 197 10.4467 1.1417 25.7886 2.2947 Very Satisfied 94 5.0193 0.7758 12.3907 1.7449 Total 750 40.5089 2.5691 100.000 -------------------------------------------------------------------------------------------------- High School Very Unsatisfied 188 10.7521 1.1076 18.0735 1.6881 Unsatisfied 217 11.6137 1.1439 19.5218 1.7280 Neutral 347 19.1119 1.4726 32.1255 2.0490 Satisfied 258 13.4845 1.2823 22.6663 1.9240 Very Satisfied 90 4.5290 0.6579 7.6128 1.0557 Total 1100 59.4911 2.5691 100.000 -------------------------------------------------------------------------------------------------- Total Very Unsatisfied 304 17.1676 1.2872 Unsatisfied 326 17.7564 1.2712 Neutral 581 31.5965 1.5795 Satisfied 455 23.9311 1.4761 Very Satisfied 184 9.5483 0.9523 Total 1850 100.000 --------------------------------------------------------------------------------------------------

Syntax: SURVEYFREQ Procedure 7999 Figure 94.9 Chi-Square Test of No Association Rao-Scott Chi-Square Test Pearson Chi-Square 18.7829 Design Correction 2.0766 Rao-Scott Chi-Square 9.0450 DF 4 Pr > ChiSq 0.0600 F Value 2.2613 Num DF 4 Den DF 1456 Pr > F 0.0605 Sample Size = 1850 Syntax: SURVEYFREQ Procedure The following statements are available in the SURVEYFREQ procedure: PROC SURVEYFREQ < options > ; BY variables ; CLUSTER variables ; REPWEIGHTS variables < / options > ; STRATA variables < / option > ; TABLES requests < / options > ; WEIGHT variable ; The PROC SURVEYFREQ statement invokes the procedure, identifies the data set to be analyzed, and specifies the variance estimation method to use. The PROC SURVEYFREQ statement is required. The TABLES statement specifies frequency or crosstabulation tables and requests tests and statistics for those tables. The STRATA statement lists the variables that form the strata in a stratified sample design. The CLUSTER statement specifies cluster identification variables in a clustered sample design. The WEIGHT statement names the sampling weight variable. The REPWEIGHTS statement names replicate weight variables for BRR or jackknife variance estimation. The BY statement requests completely separate analyses of groups defined by the BY variables. All statements can appear multiple times except the PROC SURVEYFREQ statement and the WEIGHT statement, which can appear only once. The rest of this section gives detailed syntax information for the BY, CLUSTER, REPWEIGHTS, STRATA, TABLES, and WEIGHT statements in alphabetical order after the description of the PROC SURVEYFREQ statement.

8000 Chapter 94: The SURVEYFREQ Procedure PROC SURVEYFREQ Statement PROC SURVEYFREQ < options > ; The PROC SURVEYFREQ statement invokes the SURVEYFREQ procedure. It also identifies the data set to be analyzed, specifies the variance estimation method to use, and provides sample design information. The DATA= option names the input data set to be analyzed. The VARMETHOD= option specifies the variance estimation method, which is the Taylor series method by default. For Taylor series variance estimation, you can include a finite population correction factor in the analysis by providing either the sampling rate or population total with the RATE= or TOTAL= option. If your design is stratified with different sampling rates or totals for different strata, you can input these stratum rates or totals in a SAS data set that contains the stratification variables. Table 94.1 summarizes the options available in the PROC SURVEYFREQ statement. Table 94.1 PROC SURVEYFREQ Statement Options Option DATA= MISSING NOMCAR NOSUMMARY ORDER= PAGE RATE= TOTAL= VARHEADER= VARMETHOD= Description Names the input SAS data set Treats missing values as a valid level Treats missing values as not missing completely at random Suppresses the display of the Data Summary table Specifies the order of variable levels Displays only one table per page Specifies the first-stage sampling rate Specifies the total number of primary sampling units Specifies the variable identification to display Specifies the variance estimation method You can specify the following options in the PROC SURVEYFREQ statement: DATA=SAS-data-set names the SAS-data-set to be analyzed by PROC SURVEYFREQ. If you omit the DATA= option, the procedure uses the most recently created SAS data set. MISSING treats missing values as a valid (nonmissing) category for all categorical variables, which include TABLES, STRATA, and CLUSTER variables. By default, if you do not specify the MISSING option, an observation is excluded from the analysis if it has a missing value for any STRATA or CLUSTER variable. Additionally, PROC SURVEYFREQ excludes an observation from a frequency or crosstabulation table if that observation has a missing value for any of the variables in the table request, unless you specify the MISSING option. For more information, see the section Missing Values on page 8033.

PROC SURVEYFREQ Statement 8001 NOMCAR includes observations with missing values of TABLES variables in the variance computation as not missing completely at random (NOMCAR) for Taylor series variance estimation. When you specify the NOMCAR option, PROC SURVEYFREQ computes variance estimates by analyzing the nonmissing values as a domain (subpopulation), where the entire population includes both nonmissing and missing domains. For more information, see the section Missing Values on page 8033. By default, PROC SURVEYFREQ completely excludes an observation from a frequency or crosstabulation table (and the corresponding variance computations) if that observation has a missing value for any of the variables in the table request, unless you specify the MISSING option. The NOMCAR option has no effect when you specify the MISSING option, which treats missing values as a valid nonmissing level. The NOMCAR option applies only to Taylor series variance estimation. The replication methods, which you request with the VARMETHOD=BRR and VARMETHOD=JACKKNIFE options, do not use the NOMCAR option. NOSUMMARY suppresses the display of the Data Summary table, which PROC SURVEYFREQ produces by default. For information about this table, see the section Data Summary Table on page 8070. ORDER=DATA FORMATTED FREQ INTERNAL specifies the order of the variable levels in the frequency and crosstabulation tables, which you request in the TABLES statement. The ORDER= option also controls the order of the STRATA variable levels in the Stratum Information table. The ORDER= option can take the following values: ORDER= DATA FORMATTED FREQ INTERNAL Levels Ordered By Order of appearance in the input data set External formatted value, except for numeric variables with no explicit format, which are sorted by their unformatted (internal) value Descending frequency count; levels with the most observations come first in the order Unformatted value By default, ORDER=INTERNAL. The FORMATTED and INTERNAL orders are machine-dependent. The frequency count used by ORDER=FREQ is the nonweighted frequency (sample size), rather than the weighted frequency. For more information about sort order, see the chapter on the SORT procedure in the Base SAS Procedures Guide and the discussion of BY-group processing in SAS Language Reference: Concepts. PAGE displays only one table per page. Otherwise, PROC SURVEYFREQ displays multiple tables per page as space permits.

8002 Chapter 94: The SURVEYFREQ Procedure RATE=value RATE=SAS-data-set R=value R=SAS-data-set specifies the sampling rate, which PROC SURVEYFREQ uses to compute a finite population correction for Taylor series variance estimation. You can provide a single sampling rate value, or you can provide stratum sampling rates by specifying a SAS-data-set. If your sample design has multiple stages, you should specify the first-stage sampling rate, which is the ratio of the number of primary sampling units (PSUs) in the sample to the total number of PSUs in the population. For a nonstratified sample design, or for a stratified sample design that uses the same sampling rate in all strata, you should specify a single sampling rate value. If your design is stratified and uses different sampling rates in different strata, you should name a SAS-data-set that contains the stratification variables and the stratum sampling rates. You should provide the stratum sampling rates in the data set variable named _RATE_. For more information, see the section Population Totals and Sampling Rates on page 8032. The sampling rate values must be nonnegative numbers. You can specify sampling rates as numbers between 0 and 1. Or you can specify sampling rates in percentage form as numbers between 1 and 100, which PROC SURVEYFREQ converts to proportions. The procedure treats the value 1 as 100% instead of 1%. If you do not specify the RATE= or the TOTAL= option, the Taylor series variance estimation does not include a finite population correction. You cannot specify both the RATE= and the TOTAL= option in the same PROC SURVEYFREQ statement. PROC SURVEYSELECT does not use the RATE= or the TOTAL= option for BRR or jackknife variance estimation (which you can request by specifying the VARMETHOD=BRR or VARMETHOD=JACKKNIFE option, respectively). TOTAL=value TOTAL=SAS-data-set N=value N=SAS-data-set specifies the total number of primary sampling units (PSUs), which PROC SURVEYFREQ uses to compute a finite population correction for Taylor series variance estimation. You can provide a single total value, or you can provide stratum totals by specifying a SAS-data-set. The totals must be positive numbers. If your sample design has multiple stages, you should specify the total number of primary sampling units (PSUs). For a nonstratified sample design, you should specify a single total value, which refers to the total number of PSUs in the population. For a stratified sample design that has the same population total in each stratum, you can specify a single total value, which refers to the total number of PSUs in each stratum. If your design is stratified and has different totals in different strata, you should name a SAS-data-set that contains the stratification variables and the stratum totals. You should provide the stratum totals in the data set variable named _TOTAL_. For more information, see the section Population Totals and Sampling Rates on page 8032.

PROC SURVEYFREQ Statement 8003 If you do not specify the RATE= or the TOTAL= option, the Taylor series variance estimation does not include a finite population correction. You cannot specify both the RATE= and the TOTAL= option in the same PROC SURVEYFREQ statement. PROC SURVEYSELECT does not use the RATE= or the TOTAL= option for BRR or jackknife variance estimation (which you can request by specifying the VARMETHOD=BRR or VARMETHOD=JACKKNIFE option, respectively). VARHEADER=LABEL NAME NAMELABEL specifies the variable identification to use in the displayed output. By default VARHEADER=NAME, which displays variable names in the output. The VARHEADER= option affects the headers of the variable level columns in one-way frequency tables, crosstabulation tables, and the Stratum Information table. The VARHEADER= option also controls variable identification in the table headers. The VARHEADER= option can take the following values: VARHEADER= LABEL NAME NAMELABEL Variable Identification Displayed Variable label Variable name Variable name and label, as Name (Label) VARMETHOD=BRR < (method-options) > VARMETHOD=JACKKNIFE JK < (method-options) > VARMETHOD=TAYLOR specifies the variance estimation method. VARMETHOD=TAYLOR requests the Taylor series method, which is the default if you do not specify the VARMETHOD= option or the REPWEIGHTS statement. VARMETHOD=BRR requests variance estimation by balanced repeated replication (BRR), and VARMETHOD=JACKKNIFE requests variance estimation by the delete-1 jackknife method. For VARMETHOD=BRR and VARMETHOD=JACKKNIFE, you can specify method-options in parentheses after the variance method name. Table 94.2 summarizes the available method-options. Table 94.2 Variance Estimation Options VARMETHOD= Variance Estimation Method Method Options BRR Balanced repeated replication DFADJ FAY < =value > HADAMARD=SAS-data-set OUTWEIGHTS=SAS-data-set PRINTH REPS=number JACKKNIFE JK Jackknife DFADJ OUTJKCOEFS=SAS-data-set OUTWEIGHTS=SAS-data-set TAYLOR Taylor series linearization None

8004 Chapter 94: The SURVEYFREQ Procedure Method-options must be enclosed in parentheses after the variance method name. For example: varmethod=brr(reps=60 outweights=myreplicateweights) You can specify the following values for the VARMETHOD= option: BRR < (method-options) > requests variance estimation by balanced repeated replication (BRR). The BRR method requires a stratified sample design that has two primary sampling units (PSUs) in each stratum. If you specify this option, you must also specify a STRATA statement unless you use a REPWEIGHTS statement to provide replicate weights. For more information, see the section Balanced Repeated Replication (BRR) on page 8043. You can specify the following method-options: DFADJ computes the degrees of freedom as the number of nonmissing strata for the individual table request. If you specify this option, PROC SURVEYFREQ does not count any empty strata that occur when observations that have missing values of the TABLES variables are removed from the analysis of the table. By default, PROC SURVEYFREQ computes the degrees of freedom by counting the number of nonmissing strata for all valid observations in the input data set. For more information, see the section Degrees of Freedom on page 8051. For information about valid observations, see the section Data Summary Table on page 8070. This method-option has no effect when you specify the MISSING option, which treats missing values as a valid nonmissing level. This method-option is not used when you specify the degrees of freedom in the DF= option in the TABLES statement or when you specify a REPWEIGHTS statement to provide replicate weights. When you specify a REPWEIGHTS statement, the degrees of freedom are the number of REPWEIGHTS variables (replicates) unless you specify the DF= option in the REPWEIGHTS or the TABLES statement. FAY < =value > requests Fay s method, which is a modification of the BRR method. For more information, see the section Fay s BRR Method on page 8044. You can specify the value of the Fay coefficient, which is used in converting the original sampling weights to replicate weights. The Fay coefficient must be a nonnegative number less than 1. By default, the Fay coefficient is 0.5. HADAMARD=SAS-data-set H=SAS-data-set names a SAS-data-set that contains the Hadamard matrix for BRR replicate construction. If you do not specify this method-option, PROC SURVEYFREQ generates an appropriate Hadamard matrix for replicate construction. For more information, see the sections Balanced Repeated Replication (BRR) on page 8043 and Hadamard Matrix on page 8045. If a Hadamard matrix of a particular dimension exists, it is not necessarily unique. Therefore, if you want to use a specific Hadamard matrix, you must provide the matrix as a SAS-data-set in this method-option.

PROC SURVEYFREQ Statement 8005 In the HADAMARD= input data set, each variable corresponds to a column and each observation corresponds to a row of the Hadamard matrix. You can use any variable names in the HADAMARD= data set. All values in the data set must equal either 1 or 1. You must ensure that the matrix you provide is indeed a Hadamard matrix that is, A 0 A D RI, where A is the Hadamard matrix of dimension R and I is an identity matrix. PROC SURVEYFREQ does not check the validity of the Hadamard matrix that you provide. The HADAMARD= input data set must contain at least H variables, where H denotes the number of first-stage strata in your design. If the data set contains more than H variables, PROC SURVEYFREQ uses only the first H variables. Similarly, the HADAMARD= input data set must contain at least H observations. If you do not specify the REPS= method-option, the number of replicates is assumed to be the number of observations in the HADAMARD= input data set. If you specify the number of replicates for example, REPS=nreps the first nreps observations in the HADAMARD= data set are used to construct the replicates. You can specify the PRINTH method-option to display the Hadamard matrix that PROC SURVEYFREQ uses to construct replicates for BRR. OUTWEIGHTS=SAS-data-set names a SAS-data-set to store the replicate weights that PROC SURVEYFREQ creates for BRR variance estimation. For information about replicate weights, see the section Balanced Repeated Replication (BRR) on page 8043. For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weight Output Data Set on page 8069. The OUTWEIGHTS= method-option is not available when you provide replicate weights in a REPWEIGHTS statement. PRINTH displays the Hadamard matrix that PROC SURVEYFREQ uses to construct replicates for BRR variance estimation. When you provide the Hadamard matrix in the HADAMARD= method-option, PROC SURVEYFREQ displays only the rows and columns that are actually used to construct replicates. For more information, see the sections Balanced Repeated Replication (BRR) on page 8043 and Hadamard Matrix on page 8045. The PRINTH method-option is not available when you provide replicate weights in a REPWEIGHTS statement because the procedure does not use a Hadamard matrix in this case. REPS=number specifies the number of replicates for BRR variance estimation. The value of number must be an integer greater than 1. If you do not use the HADAMARD= method-option to provide a Hadamard matrix, the number of replicates should be greater than the number of strata and should be a multiple of 4. For more information, see the section Balanced Repeated Replication (BRR) on page 8043. If PROC SURVEYFREQ cannot construct a Hadamard matrix for the REPS= value that you specify, the value is increased until a Hadamard matrix of that dimension can be constructed. Therefore, the actual number of replicates that PROC SURVEYFREQ uses might be larger than number.

8006 Chapter 94: The SURVEYFREQ Procedure If you use the HADAMARD= method-option to provide a Hadamard matrix, the value of number must not be less than the number of rows in the Hadamard matrix. If you provide a Hadamard matrix and do not specify the REPS= method-option, the number of replicates equals the number of rows in the Hadamard matrix. If you do not specify the REPS= or the HADAMARD= method-option and do not use a REPWEIGHTS statement, the number of replicates equals the smallest multiple of 4 that is greater than the number of strata. If you use a REPWEIGHTS statement to provide replicate weights, PROC SURVEYFREQ does not use the REPS= method-option; the number of replicates equals the number of REPWEIGHTS variables. JACKKNIFE < (method-options) > JK < (method-options) > requests variance estimation by the delete-1 jackknife method. For more information, see the section The Jackknife Method on page 8046. If you use a REPWEIGHTS statement to provide replicate weights, VARMETHOD=JACKKNIFE is the default variance estimation method. The delete-1 jackknife method requires at least two primary sampling units (PSUs) in each stratum for stratified designs unless you use a REPWEIGHTS statement to provide replicate weights. You can specify the following method-options: DFADJ computes the degrees of freedom by using the number of nonmissing strata and clusters for the individual table request. If you specify this method-option, PROC SURVEYFREQ does not count any empty strata or clusters that occur when observations that have missing values of the TABLES variables are removed from the analysis of the table. By default, PROC SURVEYFREQ computes the degrees of freedom by counting the number of nonmissing strata and clusters for all valid observations in the input data set. The degrees of freedom for VARMETHOD=JACKKNIFE equal the number of clusters minus the number of strata. For more information, see the section Degrees of Freedom on page 8051. For information about valid observations, see the section Data Summary Table on page 8070. This method-option has no effect when you specify the MISSING option, which treats missing values as a valid nonmissing level. This method-option is not used when you specify the degrees of freedom in the DF= option in the TABLES statement or when you specify a REPWEIGHTS statement to provide replicate weights. When you specify a REPWEIGHTS statement, the degrees of freedom are the number of REPWEIGHTS variables (replicates) unless you specify the DF= option in the REPWEIGHTS or the TABLES statement. OUTJKCOEFS=SAS-data-set names a SAS-data-set to store the jackknife coefficients. For information about jackknife coefficients, see the section The Jackknife Method on page 8046. For information about the contents of the OUTJKCOEFS= data set, see the section Jackknife Coefficient Output Data Set on page 8070.

BY Statement 8007 OUTWEIGHTS=SAS-data-set names a SAS-data-set to store the replicate weights that PROC SURVEYFREQ creates for jackknife variance estimation. For information about replicate weights, see the section The Jackknife Method on page 8046. For information about the contents of the OUTWEIGHTS= data set, see the section Replicate Weight Output Data Set on page 8069. This method-option is not available when you use a REPWEIGHTS statement to provide replicate weights. TAYLOR requests Taylor series variance estimation. This is the default method if you do not specify the VARMETHOD= option or a REPWEIGHTS statement. For more information, see the section Taylor Series Variance Estimation on page 8036. BY Statement BY variables ; You can specify a BY statement with PROC SURVEYFREQ to obtain separate analyses of observations in groups that are defined by the BY variables. When a BY statement appears, the procedure expects the input data set to be sorted in order of the BY variables. If you specify more than one BY statement, only the last one specified is used. If your input data set is not sorted in ascending order, use one of the following alternatives: Sort the data by using the SORT procedure with a similar BY statement. Specify the NOTSORTED or DESCENDING option in the BY statement for the SURVEYFREQ procedure. The NOTSORTED option does not mean that the data are unsorted but rather that the data are arranged in groups (according to values of the BY variables) and that these groups are not necessarily in alphabetical or increasing numeric order. Create an index on the BY variables by using the DATASETS procedure (in Base SAS software). Using a BY statement provides completely separate analyses of the BY groups. It does not provide a statistically valid domain (subpopulation) analysis, where the total number of units in the subpopulation is not known with certainty. You should include the domain variable(s) in your TABLES request to obtain domain analysis. For more information, see the section Domain Analysis on page 8033. For more information about BY-group processing, see the discussion in SAS Language Reference: Concepts. For more information about the DATASETS procedure, see the discussion in the Base SAS Procedures Guide.

8008 Chapter 94: The SURVEYFREQ Procedure CLUSTER Statement CLUSTER variables ; The CLUSTER statement names variables that identify the first-stage clusters in a clustered sample design. First-stage clusters are also known as primary sampling units (PSUs). The combinations of categories of CLUSTER variables define the clusters in the sample. If there is a STRATA statement, clusters are nested within strata. If your sample design has clustering at multiple stages, you should specify only the first-stage clusters (PSUs) in the CLUSTER statement. See the section Specifying the Sample Design on page 8031 for more information. If you provide replicate weights for BRR or jackknife variance estimation with the REPWEIGHTS statement, you do not need to specify a CLUSTER statement. The CLUSTER variables are one or more variables in the DATA= input data set. These variables can be either character or numeric, but the procedure treats them as categorical variables. The formatted values of the CLUSTER variables determine the CLUSTER variable levels. Thus, you can use formats to group values into levels. See the discussion of the FORMAT procedure in the Base SAS Procedures Guide and the discussions of the FORMAT statement and SAS formats in SAS Formats and Informats: Reference. An observation is excluded from the analysis if it has a missing value for any CLUSTER variable unless you specify the MISSING option in the PROC SURVEYFREQ statement. See the section Missing Values on page 8033 for more information. You can use multiple CLUSTER statements to specify CLUSTER variables. The procedure uses variables from all CLUSTER statements to create clusters. REPWEIGHTS Statement REPWEIGHTS variables < / options > ; The REPWEIGHTS statement names variables that provide replicate weights for BRR or jackknife variance estimation, which you can request by specifying the VARMETHOD=BRR or VARMETHOD=JACKKNIFE option in the PROC SURVEYFREQ statement. If you do not provide replicate weights for these methods by using a REPWEIGHTS statement, then PROC SURVEYFREQ constructs replicate weights for the analysis. See the sections Balanced Repeated Replication (BRR) on page 8043 and The Jackknife Method on page 8046 for information about replicate weights. Each REPWEIGHTS variable should contain the weights for a single replicate, and the number of replicates equals the number of REPWEIGHTS variables. The REPWEIGHTS variables must be numeric, and the variable values must be nonnegative numbers. If you provide replicate weights with a REPWEIGHTS statement, you do not need to specify a CLUSTER or STRATA statement. If you use a REPWEIGHTS statement and do not specify the VARMETHOD= option in the PROC SURVEYFREQ statement, the procedure uses VARMETHOD=JACKKNIFE by default.