SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG

Similar documents
Understanding the Concepts and Features of Macro Programming 1

BACKGROUND INFORMATION ON COMPLEX SAMPLE SURVEYS

Identifying Duplicate Variables in a SAS Data Set

Arthur L. Carpenter California Occidental Consultants, Oceanside, California

Using Templates Created by the SAS/STAT Procedures

Paper S Data Presentation 101: An Analyst s Perspective

Christopher Louden University of Texas Health Science Center at San Antonio

Two useful macros to nudge SAS to serve you

Analysis of Complex Survey Data with SAS

Paper SDA-11. Logistic regression will be used for estimation of net error for the 2010 Census as outlined in Griffin (2005).

GET A GRIP ON MACROS IN JUST 50 MINUTES! Arthur Li, City of Hope Comprehensive Cancer Center, Duarte, CA

ABSTRACT INTRODUCTION MACRO. Paper RF

MISSING DATA AND MULTIPLE IMPUTATION

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

Poisson Regressions for Complex Surveys

Macro to compute best transform variable for the model

Please login. Take a seat Login with your HawkID Locate SAS 9.3. Raise your hand if you need assistance. Start / All Programs / SAS / SAS 9.

A Cross-national Comparison Using Stacked Data

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

Base and Advance SAS

Get Started Writing SAS Macros Luisa Hartman, Jane Liao, Merck Sharp & Dohme Corp.

Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

Acknowledgments xi Preface xiii About the Author xv About This Book xvii New in the Macro Language xxi

An Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step Mike Zdeb, FSL, University at Albany School of Public Health, Rensselaer, NY

In this paper, we will build the macro step-by-step, highlighting each function. A basic familiarity with SAS Macro language is assumed.

Tips & Tricks. With lots of help from other SUG and SUGI presenters. SAS HUG Meeting, November 18, 2010

Paper An Automated Reporting Macro to Create Cell Index An Enhanced Revisit. Shi-Tao Yeh, GlaxoSmithKline, King of Prussia, PA

Foundations and Fundamentals. SAS System Options: The True Heroes of Macro Debugging Kevin Russell and Russ Tyndall, SAS Institute Inc.

ECLT 5810 SAS Programming - Introduction

A SAS Macro for Covariate Specification in Linear, Logistic, or Survival Regression

The SAS %BLINPLUS Macro

Efficiency Programming with Macro Variable Arrays

Getting Classy: A SAS Macro for CLASS Statement Automation

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by

Quality Control of Clinical Data Listings with Proc Compare

/********************************************/ /* Evaluating the PS distribution!!! */ /********************************************/

The SAS interface is shown in the following screen shot:

Introduction to STATA

A User Manual for the Multivariate MLE Tool. Before running the main multivariate program saved in the SAS file Part2-Main.sas,

Paper CC06. a seed number for the random number generator, a prime number is recommended

SAS Workshop. Introduction to SAS Programming. Iowa State University DAY 2 SESSION IV

The SURVEYREG Procedure

David S. Septoff Fidia Pharmaceutical Corporation

To conceptualize the process, the table below shows the highly correlated covariates in descending order of their R statistic.

SAS seminar. The little SAS book Chapters 3 & 4. April 15, Åsa Klint. By LD Delwiche and SJ Slaughter. 3.1 Creating and Redefining variables

Paper Appendix 4 contains an example of a summary table printed from the dataset, sumary.

Stat 5100 Handout #14.a SAS: Logistic Regression

Unlock SAS Code Automation with the Power of Macros

SAS Programs SAS Lecture 4 Procedures. Aidan McDermott, April 18, Outline. Internal SAS formats. SAS Formats

/ * %STR(argument) %NRSTR(argument)

Submitting SAS Code On The Side

Open Problem for SUAVe User Group Meeting, November 26, 2013 (UVic)

A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN

Using PROC REPORT to Cross-Tabulate Multiple Response Items Patrick Thornton, SRI International, Menlo Park, CA

Using MACRO and SAS/GRAPH to Efficiently Assess Distributions. Paul Walker, Capital One

Handling Numeric Representation SAS Errors Caused by Simple Floating-Point Arithmetic Computation Fuad J. Foty, U.S. Census Bureau, Washington, DC

Creating Macro Calls using Proc Freq

Introduction. Getting Started with the Macro Facility CHAPTER 1

ABSTRACT INTRODUCTION TRICK 1: CHOOSE THE BEST METHOD TO CREATE MACRO VARIABLES

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys

T.I.P.S. (Techniques and Information for Programming in SAS )

SAS CURRICULUM. BASE SAS Introduction

Tales from the Help Desk 6: Solutions to Common SAS Tasks

PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA

Statistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment

WEB MATERIAL. eappendix 1: SAS code for simulation

Why & How To Use SAS Macro Language: Easy Ways To Get More Value & Power from Your SAS Software Tools

SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada

KEYWORDS Metadata, macro language, CALL EXECUTE, %NRSTR, %TSLIT

Efficient Processing of Long Lists of Variable Names

DATA Step Debugger APPENDIX 3

SAS/STAT 14.2 User s Guide. The SURVEYIMPUTE Procedure

A Tutorial on the SAS Macro Language

Reducing Credit Union Member Attrition with Predictive Analytics

A Macro for Systematic Treatment of Special Values in Weight of Evidence Variable Transformation Chaoxian Cai, Automated Financial Systems, Exton, PA

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95

Generating Customized Analytical Reports from SAS Procedure Output Brinda Bhaskar and Kennan Murray, RTI International

Macro Language Dictionary

A Side of Hash for You To Dig Into

Techniques for Writing Robust SAS Macros. Martin Gregory. PhUSE Annual Conference, Oct 2009, Basel

A SAS Macro for Balancing a Weighted Sample

RUN_MACRO Run! With PROC FCMP and the RUN_MACRO Function from SAS 9.2, Your SAS Programs Are All Grown Up

Procedure for Stamping Source File Information on SAS Output Elizabeth Molloy & Breda O'Connor, ICON Clinical Research

Prove QC Quality Create SAS Datasets from RTF Files Honghua Chen, OCKHAM, Cary, NC

Utility Macros for Enterprise Miner Daniel Edstrom, Noel-Levitz, Centennial, CO

Summary Table for Displaying Results of a Logistic Regression Analysis

SAS Linear Model Demo. Overview

BreakOnWord: A Macro for Partitioning Long Text Strings at Natural Breaks Richard Addy, Rho, Chapel Hill, NC Charity Quick, Rho, Chapel Hill, NC

Going Under the Hood: How Does the Macro Processor Really Work?

= %sysfunc(dequote(&ds_in)); = %sysfunc(dequote(&var));

Introduction to SAS. I. Understanding the basics In this section, we introduce a few basic but very helpful commands.

SAS/STAT 14.2 User s Guide. The SURVEYREG Procedure

Hidden in plain sight: my top ten underpublicized enhancements in SAS Versions 9.2 and 9.3

SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD

Program Validation: Logging the Log

SAS Macro Language: Reference

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

Don't quote me. Almost having fun with quotes and macros. By Mathieu Gaouette

Telephone Survey Response: Effects of Cell Phones in Landline Households

Transcription:

Paper SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG Qixuan Chen, University of Michigan, Ann Arbor, MI Brenda Gillespie, University of Michigan, Ann Arbor, MI ABSTRACT This paper describes a macro to do backward selection for survey regression. At each step of backward elimination, p-values are calculated by using PROC SURVEYREG. The p-values are sorted, and the maximum is compared to the staying criterion, which is defined by users, to decide if the corresponding variable should be eliminated. Users are required to provide a list of variables that should be included in all models and a list of variables that could be eliminated during the backward elimination procedure. This macro is also capable of handling categorical variables. In the process of explaining the macro code, we will review three ways of creating macro variables and the following SAS macro functions: %scan, %str, %eval, and %unquote. INTRODUCTION Although many variables selection methods are available in other SAS procedures, like REG, LOGISTIC, PHREG, none is available in PROC SURVEYREG. The SURVEYREG procedure performs regression analysis for data with survey sampling weights. When the hypotheses of interest are known, the procedure fits linear models, calculates regression coefficients, and provides significance tests by accounting for the survey sample designs. However, sometimes the hypotheses of interest are unclear, and the pool of potential explanatory variables is big compared to the sample size. Some automatic procedures are in need to identify relatively important explanatory variables. A SAS macro was designed to do backward elimination for survey regression. EXPLANATION OF MACRO CODE Defining Macro Parameters The macro has nine parameters that must be provided by the users when the macro is called: dataset = the name of the data set that contains the outcome, explanatory variables and all the sample design information, including sampling weights, strata and clusters forceinvar = the list of variables that are forced into all the models during the backward selection procedure varlist = the list of variables that could be eliminated during the backward selection procedure catvarlist = the list of categorical variables that are either in forceinvar or in varlist outcome = the name of the outcome variable in the survey regression weight = the name of the survey sampling weights in the complex survey design strata = the name of the survey sampling strata in the complex survey design cluster = the name of the survey sampling cluster in the complex survey design alpha = the criterion for retaining a variable in the backward elimination procedure The Backward Elimination Procedure Before starting the backward elimination procedure, the number of variables in the original varlist is counted, denoted as nvar. In addition, we calculate nforceinvar, the number of variables in the forceinvar, and create a series of macro variables, called forceinvar1, forceinvar2,, forceinvar&nforceinvar. At each loop when there is at least one variable in the varlist, PROC SURVEYREG is run by putting all the forced in variables and the variables in the varlist at that iteration. The individual p-values for continuous variables and the p-values of the overall F-tests for every categorical variables are obtained from the ODS output and saved in a data set, namely out. Next, all the variables in the forceinvar are removed from the data out. The p-values are sorted among the rest of variables. The maximal p-value is compared to the staying criterion alpha. If the maximal p-value is greater than alpha, the corresponding variable is eliminated from the varlist, and nvar is subtracted by one, else the backward elimination

procedure ends. The loops continue until either nvar equals to zero or the highest p-value is less than or equal to alpha. The list of categorical variables is not updated during the backward elimination procedure, since extra categorical variables in the CLASS statement of PROC SURVEYREG does not have any effects on the parameter estimates or the significance tests. The parameter estimates in the final model are saved in the data set estimate. Note that if no significant variables are chosen, the last variable to be dropped is retained in the data set estimate. EXAMPLE OF MACRO USE In the data set example, there are 17 variables and 50 observations. The variables include Y, X1-X3, Z1- Z10, weight, strata and cluster. Y is the outcome, X1-X3 are known predictors, and Z1-Z10 are 10 potential explanatory variables. X1, Z3, Z4 and Z9 are categorical variables. The goal of the study is to identify if any variables among Z1-Z10 are important explanatory variables to Y. Additionally, we want to get estimates for X1-X3 after adjusting for other explanatory variables for Y, but without losing too many degrees of freedom. First, download and save the %backward macro definition in your system. In your SAS program, specify this statement to define the %backward macro and make it available for use: %inc "<location of your file containing the backward macro>"; Next, define the forceinvar, varlist and catvarlist by using %LET macro function. %let forceinvar = X1 X2 X3; %let varlist = Z1 Z2 Z3 Z4 Z5 Z6 Z7 Z8 Z9 Z10; %let catvarlist = X1 Z3 Z4 Z9; Finally, call the macro: %backward (example, &forceinvar, &varlist, &catvarlist, Y, weight, strata, cluster, 0.1); CONCLUSION The %backward macro is useful when there are many potential predictors. The programming time is short. The macro screens relatively important explanatory variables quickly. Moreover, the backward macro is short and easily understandable. It can also be easily extended to the backward selection procedure of any other statistical regression procedure, for example, the SURVEYLOGISTIC procedure. MACRO CODE The complete SAS code for the backward selection macro is provided below: %macro backward (dataset, forceinvar, varlist, catvarlist, outcome, weight, strata, cluster, alpha); %let nvar = 0; %do %while (%scan (&varlist, %eval(&nvar+1), %str( )) NE); %let nvar = %eval (&nvar+1); %let nforceinvar = 0; %do %while (%scan (&forceinvar, %eval(&nforceinvar + 1), %str( )) NE); %let nforceinvar = %eval(&nforceinvar + 1); %let forceinvar&nforceinvar = %scan(&forceinvar, &nforceinvar); %do %while (&nvar > 0); proc surveyreg data = &dataset; weight &weight; strata &strata; cluster &cluster; class &catvarlist;

model &outcome = &forceinvar &varlist /solution adjrsq; ods output Effects = out; ods output ParameterEstimates = estimate; data result; set out; if Effect in ('Model', 'Intercept', %do i=1 %to %eval(&nforceinvar-1); %unquote(%str(%'&&forceinvar&i%')), %unquote(%str(%'&&forceinvar&nforceinvar%'))) then delete; keep Effect ProbF; proc sort data = result; by descending ProbF; if _N_ = 1 then do; call symput('pvalue',probf); if ProbF > &alpha then delete; end; drop ProbF; %if &pvalue < &alpha %then %goto exit; proc sql noprint; select Effect into :varlist separated by ' ' from result; quit; %let nvar = %eval(&nvar-1); %exit: %mend backward; A REVIEW OF MACRO FEATURES USED IN MACRO BACKWARD Creating Macro Variables There are several ways to create macro variables. In this macro, we use %LET, CALL SYMPUT, and PROC SQL to create macro variables. Use %LET to produce a macro variable, and give it a value. EXAMPLE 1: %let nvar = 0; %put &nvar; 0 Use CALL SYMPUT to store the value of a given variable from a data set. EXAMPLE 2: data result; input Effect $ ProbF; datalines; age 0.501 BMI 0.001 sex 0.002 ;

if _N_ = 1 then CALL SYMPUT('pvalue', ProbF); %put &pvalue; 0.501 Use SELECT INTO in PROC SQL to store the values of a variable from a data set as the text of a macro variable PROC SQL noprint; select Effect into :varlist separated by ' ' from result; quit; %put &varlist; age BMI sex Functions Used in the Macro Use %SCAN function to search an argument and return the nth word. %let var1 = %scan(&varlist, 1, %str( )); %put &var1; age Use %STR function to specify a blank as the delimiter in %SCAN function or for character strings that contain a quotation mark. %let var2 = %str(% &var1% ); %put &var2; age Use %UNQUOTE function to unmask the value of a macro variable that is assigned using a macro quoting function. If the value is not unmasked, when the variable is referenced, it produces error message. *** The following program generates error messages in the LOG because the value of var2 is still masked when it reaches the SAS compiler. *** if Effect = &var2 then delete; ***This version of the program runs correctly*** if Effect =%unquote(&var2) then delete; Use %EVAL function to evaluate integer arithmetic. Continued on EXAMPLE 1: %let nvar = %eval(&nvar + 1); %put &nvar; 1

ACKNOWLEDGEMENT The authors acknowledge Dr. David Garabrant, Dr. Al Franzblau, Dr. Peter Adriaens, Dr. Avery Demond, Dr. James Lepkowski and the University of Michigan Dioxin Exposure Study for supporting the project and Kathy Welch for her continued SAS assistance. REFERENCES 1. SAS Macro Language: Reference, Version 8, Cary, NC: SAS Institute Inc. 2. Mike S. Zdeb, Creating Macro Variables Via PROC SQL, NESUG, Washington, DC 1999 CONTACT INFORMATION Please send your comments and questions to: Qixuan Chen The Department of Biostatistics School of Public Health University of Michigan 1420 Washington Heights Ann Arbor, MI 48109-2029 Email: qixuan@umich.edu