ABSTRACT INTRODUCTION TRICK 1: CHOOSE THE BEST METHOD TO CREATE MACRO VARIABLES

Similar documents
Checking for Duplicates Wendi L. Wright

Web-Application Bar Charts without SAS/GRAPH. Steve James, Centers for Disease Control and Prevention, Atlanta, GA

Taming a Spreadsheet Importation Monster

Automate Clinical Trial Data Issue Checking and Tracking

Program Validation: Logging the Log

One SAS To Rule Them All

SAS Macro Dynamics: from Simple Basics to Powerful Invocations Rick Andrews, Office of Research, Development, and Information, Baltimore, MD

Cover the Basics, Tool for structuring data checking with SAS Ole Zester, Novo Nordisk, Denmark

Data Quality Review for Missing Values and Outliers

SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada

Applications Development. Paper 38-28

Run your reports through that last loop to standardize the presentation attributes

Presentation Quality Bulleted Lists Using ODS in SAS 9.2. Karl M. Kilgore, PhD, Cetus Group, LLC, Timonium, MD

SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

My Reporting Requires a Full Staff Help!

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System

ABSTRACT: INTRODUCTION: WEB CRAWLER OVERVIEW: METHOD 1: WEB CRAWLER IN SAS DATA STEP CODE. Paper CC-17

A Generalized Macro-Based Data Reporting System to Produce Both HTML and Text Files

footnote1 height=8pt j=l "(Rev. &sysdate)" j=c "{\b\ Page}{\field{\*\fldinst {\b\i PAGE}}}";

ODS TAGSETS - a Powerful Reporting Method

CMISS the SAS Function You May Have Been MISSING Mira Shapiro, Analytic Designers LLC, Bethesda, MD

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

Using MACRO and SAS/GRAPH to Efficiently Assess Distributions. Paul Walker, Capital One

Using Recursion for More Convenient Macros

A Quick and Gentle Introduction to PROC SQL

1. Please, please, please look at the style sheets job aid that I sent to you some time ago in conjunction with this document.

Hash Objects for Everyone

Creating Macro Calls using Proc Freq

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

Have SAS Annotate your Blank CRF for you! Plus dynamically add color and style to your annotations. Steven Black, Agility-Clinical Inc.

3N Validation to Validate PROC COMPARE Output

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

Coders' Corner. Paper Scrolling & Downloading Web Results. Ming C. Lee, Trilogy Consulting, Denver, CO. Abstract.

A Macro to Keep Titles and Footnotes in One Place

A Macro to Create Program Inventory for Analysis Data Reviewer s Guide Xianhua (Allen) Zeng, PAREXEL International, Shanghai, China

Paper HOW-06. Tricia Aanderud, And Data Inc, Raleigh, NC

Developing Data-Driven SAS Programs Using Proc Contents

Automating Preliminary Data Cleaning in SAS

How to Create Data-Driven Lists

Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA

Use That SAP to Write Your Code Sandra Minjoe, Genentech, Inc., South San Francisco, CA

Get Started Writing SAS Macros Luisa Hartman, Jane Liao, Merck Sharp & Dohme Corp.

The Dataset Diet How to transform short and fat into long and thin

SAS Application to Automate a Comprehensive Review of DEFINE and All of its Components

Part 1. Introduction. Chapter 1 Why Use ODS? 3. Chapter 2 ODS Basics 13

... ) city (city, cntyid, area, pop,.. )

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

Arthur L. Carpenter California Occidental Consultants, Oceanside, California

DESCRIPTION OF THE PROJECT

ABSTRACT MORE THAN SYNTAX ORGANIZE YOUR WORK THE SAS ENTERPRISE GUIDE PROJECT. Paper 50-30

Handling Numeric Representation SAS Errors Caused by Simple Floating-Point Arithmetic Computation Fuad J. Foty, U.S. Census Bureau, Washington, DC

Macro to compute best transform variable for the model

Making a SYLK file from SAS data. Another way to Excel using SAS

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization

It s not the Yellow Brick Road but the SAS PC FILES SERVER will take you Down the LIBNAME PATH= to Using the 64-Bit Excel Workbooks.

Greenspace: A Macro to Improve a SAS Data Set Footprint

Can you decipher the code? If you can, maybe you can break it. Jay Iyengar, Data Systems Consultants LLC, Oak Brook, IL

The 'SKIP' Statement

Macros from Beginning to Mend A Simple and Practical Approach to the SAS Macro Facility

PharmaSUG Paper PO12

shortcut Tap into learning NOW! Visit for a complete list of Short Cuts. Your Short Cut to Knowledge

Format-o-matic: Using Formats To Merge Data From Multiple Sources

A Side of Hash for You To Dig Into

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG

An Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step Mike Zdeb, FSL, University at Albany School of Public Health, Rensselaer, NY

Setting the Percentage in PROC TABULATE

SAS Styles ODS, Right? No Programming! Discover a Professional SAS Programming Style That Will Last a Career

QUERIES BY ODS BEGINNERS. Varsha C. Shah, Dept. of Biostatistics, UNC-CH, Chapel Hill, NC

Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide

Using SAS/SCL to Create Flexible Programs... A Super-Sized Macro Ellen Michaliszyn, College of American Pathologists, Northfield, IL

Tales from the Help Desk 6: Solutions to Common SAS Tasks

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

Reading in Data Directly from Microsoft Word Questionnaire Forms

SESUG 2014 IT-82 SAS-Enterprise Guide for Institutional Research and Other Data Scientists Claudia W. McCann, East Carolina University.

Post-Processing.LST files to get what you want

Quality Control of Clinical Data Listings with Proc Compare

Using SAS Macros to Extract P-values from PROC FREQ

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

Indenting with Style

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

A SAS Solution to Create a Weekly Format Susan Bakken, Aimia, Plymouth, MN

What Do You Mean My CSV Doesn t Match My SAS Dataset?

Make SAS Enterprise Guide Your Own. John Ladds Statistics Canada Paper

Customized Flowcharts Using SAS Annotation Abhinav Srivastva, PaxVax Inc., Redwood City, CA

Automating the Production of Formatted Item Frequencies using Survey Metadata

ABSTRACT INTRODUCTION MACRO. Paper RF

PharmaSUG China. Systematically Reordering Axis Major Tick Values in SAS Graph Brian Shen, PPDI, ShangHai

An Application of ODS Tagsets. Notice! Paper

A Visual Step-by-step Approach to Converting an RTF File to an Excel File

Creating a Custom Report in Quality Window 5

Facilitate Statistical Analysis with Automatic Collapsing of Small Size Strata

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

Essential ODS Techniques for Creating Reports in PDF Patrick Thornton, SRI International, Menlo Park, CA

IF there is a Better Way than IF-THEN

LST in Comparison Sanket Kale, Parexel International Inc., Durham, NC Sajin Johnny, Parexel International Inc., Durham, NC

Journey to the center of the earth Deep understanding of SAS language processing mechanism Di Chen, SAS Beijing R&D, Beijing, China

ODS for PRINT, REPORT and TABULATE

Transcription:

An Efficient Method to Create a Large and Comprehensive Codebook Wen Song, ICF International, Calverton, MD Kamya Khanna, ICF International, Calverton, MD Baibai Chen, ICF International, Calverton, MD ABSTRACT At the conclusion of many data management projects, providing a client with an informative codebook is a common task for data managers or SAS programmers. When faced with a particularly large dataset containing thousands of variables or when faced with clients who demand many details about their variables, even the most experienced programmers might find this to be a daunting task. This paper provides a method of using SAS to create a customized codebook that includes the names, labels, types, values, formatted values, frequencies and percentages for every variable in a dataset. INTRODUCTION Codebooks are informative data summary reports which provide non-sas user clients a comprehensive view of their data, and they often play a significant role in a project. When you have several projects in hand, and each of them requires you to create a brand new codebook all within a short amount of time it is easy to feel overwhelmed. Upon closer look at these different codebooks, we might find that although the output format for the codebooks may vary from simple Rich text file to complicated HTML output, most codebooks consist of very similar data information: labels, types, values, frequencies, and basic statistical information (Figure1 demonstrates a simple codebook). These consistencies allow developers to write a modifiable code that is independent of such knowledge as what data sets to use or what variables exist in the data sets prior to execution. Thanks to the capability of MACRO Language within SAS system and the employment of PROC FORMAT, we can create a code that may be applied to any datasets with minimum revision, and produce codebooks in a matter of minutes. In this paper, we will outline a few simple steps and a couple of handy tricks to do just that. Figure 1. Codebook in Excel format TRICK 1: CHOOSE THE BEST METHOD TO CREATE MACRO VARIABLES After classifying your variables into three or four general categories (e.g. numeric variables with specified formats, continuous or discrete numeric variables which are requested to display statistical information, and character variables which only need simple frequencies and percentages for missing and non-missing values), it becomes clear that a certain set of operations (PROC FREQ or PROC MEANS) must be performed on a series of many variables. We need to use MACROs to do the job. 1

PROC SQL? PROBABLY NOT! Most programmers have a preferred way to create MACRO variables and I am no different. So when I started to write my code, the first thing occurred to me was the PROC SQL function. By using PROC SQL, I placed all variables name into the MACRO variable &names1, and created &nvar to detect how many variables are in the data set. The MACRO variable &name is created by the MACRO function %SCAN, which is a means to extract every variable name out from &names1. A MACRO do loop then executes the PROC FREQ procedure for every variable which belongs to the first category and inserts variable name to the output data set that is generated. proc sql noprint; select name into:names1 separated by '#'from a1; select count(name) into :nvar from a1; quit; %MACRO freq; %do i=1 %to &nvar; %let name=%scan(&names1,&i,'#'); proc freq data=mmpst; tables &name / missing out=&name; data &name; set &name; length var $20; var= &name ; %freq Output 1. Partial Output from the MACRO code above ENS16v COUNT PERCENT var A 15 29.4118 ENS16v C 6 11.7647 ENS16v 1 29 56.8627 ENS16v 2 1 1.9608 ENS16v The program succeeded in performing this first step. The values, frequencies and variable names are in the output as requested, but I soon found that there are certain limitations in this code. When I wanted to follow the same strategy to create another MACRO variable called label to store variable label information, I got an unwelcome warning: The quoted string currently being processed has become more than 262 characters long. You might have unbalanced quotation marks. This is not even to mention the lengthy error message I got when I tried to apply %SCAN function to this MACRO variable! I knew I could get the label information by merging the PROC FREQ output with PROC CONTENTS output, but I was unwilling to try this relatively clumsy method since my goal is to create a clear and concise MACRO code. CALL SYMPUT DATA DEFINED MACRO VARIABLES? A SENSIBLE CHOICE! After several experiments, I discovered that a series of operations or procedures can be easily applied to several different data sets by using &&var&i operations. The advantages of using CALL SYMPUT DATA defined MACRO variables include not only assigning a sequential number to each variable or each data set, but also providing a convenient way to store the compound character values. In the following program, the order of observation _N_ is used to assign a sequential number to each variable name, label and format. The total number of observations is also retained in &nvar. Note that the resolution to &&fformat&i is not same as the format value for variable &&varname&i; it is actually a compound value, i.e. a concatenation of format value and the character. A MACRO do loop executes PROC FREQ for every variable that belongs to the first category, then inserts variable name, variable label and formatted value to the output data set which is generated by PROC FREQ, and finally formats the count and percentage values. 2

%MACRO count1; data _null_; set index1; call symput(compress("varname" _N_),trim(name)); call symput(compress("nvar"),compress(_n_)); call symput(compress("vlabel" _N_),trim(label)); call symput(compress("fformat" _N_),compress(format) '.'); %do i=1 %to &nvar; proc freq data=noformat noprint; tables &&varname&i/missing out=&&varname&i; data &&varname&i(drop=count percent &&varname&i); set &&varname&i; length label$100 varname no per $20 ; fvalue=&&varname&i; format fvalue &&fformat&i; varname="&&varname&i"; var=&&varname&i; label="&&vlabel&i"; no=put(count,8.); per=compress(put(percent,5.1)) "%"; %count1 Output 2. Partial Output from the MACRO code above label varname no per fvalue var ENS16v - Groups address - Local wellness ENS16v 15 29.4% SKIPPED A ENS16v - Groups address - Local wellness ENS16v 6 11.8% NOT ANSWERED C ENS16v - Groups address - Local wellness ENS16v 29 56.9% YES 1 ENS16v - Groups address - Local wellness ENS16v 1 2.0% NO 2 The output is almost perfect! Not only do we have nice labels in the data set, but we also have the formatted values of the data. This code is good enough to use for creating a codebook with separated tables for all variables (Output 2 is a good example of this type codebook). However if your goal is to create an integrated codebook such as the one in Figure 1, the program above still has to incorporate one small modification. TRICK 2: GENERATING NEW VARIABLES WITH PROC FORMAT When you look at the Output 2 above, it is hard to figure out what there is left to adjust in the code. But if you try to set all of the output data sets together, the problems become clear - the formatted values look very strange. The following output shows the result of setting two variables together. Since those two variables have different formats, the formatted values depicted here are inappropriate. 3

Output 3. Output from setting Ens16v and Ens17 together. Note that the values for fvalue are inappropriate. label varname no per fvalue var ENS16v-Groups address - Local wellness ENS16v 15 29.4% SKIPPED A ENS16v-Groups address - Local wellness ENS16v 6 11.8% NOT ANSWERED C ENS16v-Groups address - Local wellness ENS16v 29 56.9% YES 1 ENS16v-Groups address - Local wellness ENS16v 1 2.0% NO 2 ENS17-Meetings during past 12 months ENS17 15 29.4% SKIPPED A ENS17-Meetings during past 12 months ENS17 5 9.8% NOT ANSWERED C ENS17-Meetings during past 12 months ENS17 6 11.8% NO 2 ENS17-Meetings during past 12 months ENS17 14 27.5% 3 3 ENS17-Meetings during past 12 months ENS17 7 13.7% 4 4 ENS17-Meetings during past 12 months ENS17 4 7.8% 5 5 At first I was stumped, trying to figure out how to resolve this problem. I asked myself, should I use an IF/THEN/ELSE code to assign the appropriate values to the variables? After some research and discussion with my colleagues, I quickly rejected this idea since that is quite a tedious process. Instead I found out that there is in fact a very simple way to generate new variables using PROC FORMAT - a method which seems too simple to be true. The PUT function calls the Format &&format&i to generate a new variable called fvalue, like so: fvalue=put(&&varname&i,&&fformat&i); Output 4. Output from setting Ens16v and Ens17 together. Note that fvalue now has appropriate values. label varname no per fvalue var ENS16v-Groups address - Local wellness ENS16v 15 29.4% SKIPPED A ENS16v-Groups address - Local wellness ENS16v 6 11.8% NOT ANSWERED C ENS16v-Groups address - Local wellness ENS16v 29 56.9% YES 1 ENS16v-Groups address - Local wellness ENS16v 1 2.0% NO 2 ENS17-Meetings during past 12 months ENS17 15 29.4% SKIPPED A ENS17-Meetings during past 12 months ENS17 5 9.8% NOT ANSWERED C ENS17-Meetings during past 12 months ENS17 6 11.8% 1 OR 2 TIMES 2 ENS17-Meetings during past 12 months ENS17 14 27.5% 3 OR 4 TIMES 3 ENS17-Meetings during past 12 months ENS17 7 13.7% 5 OR 6 TIMES 4 ENS17-Meetings during past 12 months ENS17 4 7.8% MORE THAN 6 TIMES 5 The complete code is now shown below with the %IF/%THEN %DO loop added to the program to SET all of the output data sets together. %MACRO count1; data _null_; set index1; call symput(compress("varname" _N_),trim(name)); call symput(compress("nvar"),compress(_n_)); call symput(compress("vlabel" _N_),trim(label)); call symput(compress("fformat" _N_),compress(format) '.'); 4

%do i=1 %to &nvar; proc freq data=noformat noprint; tables &&varname&i/missing out=&&varname&i; data &&varname&i(drop=count percent &&varname&i); set &&varname&i; length fvalue $50 label$100 varname no per $20 ; fvalue=put(&&varname&i,&&fformat&i); varname="&&varname&i"; var=&&varname&i; label="&&vlabel&i"; no=put(count,8.); per=compress(put(percent,5.1)) "%"; %count1 FORMAT THE OUTPUT The final step to create a codebook is to apply the appropriate OUTPUT FORMATs to the data set that we have already generated. PDF, EXCEL or RTF are all good choices for a printable copy of a codebook, as well as for wider electronic distribution. There are many different ways of enhancing the appearance and readability of codebooks. One way is to create a web output that provides readers with additional links to access more detailed information about the variables in the codebook. As Figure 2 shows below, the variable names are used as the links to the variable information. When you click the variable name you want to explore, you are directed to the appropriate destination where a table containing even more detailed variable information is located. To further enhance the HTML output, you can use various style elements available in SAS. Alternatively, you can add a compute block along with a style statement. If you are very comfortable with HTML, you even can use Hypertext Markup Language directly in SAS. The Appendix shows a complete code to get a codebook that resembles the style depicted in Figure 2 by embedding Hypertext Markup Language in SAS program. CONCLUSION This paper serves as an introduction to generating clean and comprehensive codebooks that contain information for a large number of variables. The programs presented in this paper can be easily modified to fit the needs of different users. I hope to have provided sufficient explanation for even a beginner level SAS programmer to learn how to handle creating such an extensive report. Although my methods are basic, they are extremely useful even in an advanced user s day-to-day programming. 5

APPENDIX SAMPLE CODES TO GENERATE HYPERLINK EMBEDDED HTML OUTPUT %MACRO output; data _null_; set AF; call symput(compress("varname" _N_),trim(name)); call symput(compress("nvar"),compress(_n_)); call symput(compress("vlabel" _N_),trim(label)); varlabellen=length(compress(label)); call symput(compress("varlabellen" _N_),trim(varlabellen)); /* HTML - Write header */ %let tab='09'x; file "C:\Documents and Settings\codebook.html"; put "<html>"; put "<head>"; put "<h1>het2 CODEBOOK</h1>"; put "</head>"; put "<body>"; put "<p>"; put '<table border cellspacing="0" cellpadding="2">'; %do i=1 %to &nvar ; put &tab. "<tr>"; put &tab. '<td align="left"><a href="#' "&&&varname&i." '">' "&&&varname&i." '</a></td>'; %if &&&varlabellen&i.>0 %then %do; put &tab. &tab. "<td>&&&vlabel&i.</td>"; %else %do; put &tab. &tab. "<td><i>(this variable has no label.)</i></td>"; put &tab. "</tr>"; put "</table>"; /* HTML - Write each variable */ %do i=1 %to &nvar ; file "C:\Documents and Settings\wsong\Desktop\New Codebook\codebook.html" mod; put "<p>"; put '<big><b><a name="' "&&&varname&i." '"' ">&&&varname&i.</a></b></big>"; put "<br><br>"; %if &&&varlabellen&i.>0 %then %do; put "Label: &&&vlabel&i"; %else %do; put "<i>(this variable has no label.)</i>"; put "<br><br>"; DATA T (drop=varnum name); SET &&varname&i; RUN; proc sql noprint; select count(flength) into: lastobs from T; quit; SET T; 6

file "C:\Documents and Settings\wsong\Desktop\New Codebook\codebook.html" mod; if _N_=1 THEN DO; put '<table border cellspacing="0" cellpadding="2">'; put &tab. "<tr>"; put &tab. &tab. "<th>length</th>"; put &tab. &tab. "<th>label</th>"; put &tab. &tab. "<th>value</th>"; put &tab. &tab. "<th>formated Value</th>"; put &tab. &tab. "<th>count</th>"; put &tab. &tab. "<th>percent</th>"; put &tab. "</tr>"; END; put &tab. &tab. '<td align="left">' flength '</td>'; put &tab. &tab. '<td align="left">' label '</td>'; put &tab. &tab. '<td align="left">' value '</td>'; put &tab. &tab. '<td align="left">' fvalue '</td>'; put &tab. &tab. '<td align="left">' no '</td>'; put &tab. &tab. '<td align="left">' per '</td>'; put &tab. "</tr>"; if _N_=&lastobs then do; put "</table>"; END; file "C:\Documents and Settings\wsong\Desktop\New Codebook\codebook.html" mod; put "</body>"; put "</html>"; options notes; %output ACKNOWLEDGMENTS I would like to thank my boss Tonja Kyle and our client (CDC/OID/NCHHSTP) for providing me the challenges over the years. I also want to thank my colleagues Kamya Khanna and Baibai Chen, who are the co-authors for this paper, for their helpful discussions. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Wen Song Company: ICF International Address: 11785 Beltsville Dr, suite 300 City, State ZIP: Beltsville, MD, 20705 Work Phone: 301-572-0963 E-mail: wen.song@icfi.com 7