A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN

Similar documents
PROC MEANS for Disaggregating Statistics in SAS : One Input Data Set and One Output Data Set with Everything You Need

Creating Macro Calls using Proc Freq

Using PROC REPORT to Cross-Tabulate Multiple Response Items Patrick Thornton, SRI International, Menlo Park, CA

Using Templates Created by the SAS/STAT Procedures

Macro to compute best transform variable for the model

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

Validation Summary using SYSINFO

Submitting SAS Code On The Side

The Ugliest Data I ve Ever Met

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT)

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by

Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research

Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA

A SAS Solution to Create a Weekly Format Susan Bakken, Aimia, Plymouth, MN

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

Identifying Duplicate Variables in a SAS Data Set

Planting Your Rows: Using SAS Formats to Make the Generation of Zero- Filled Rows in Tables Less Thorny

A Cross-national Comparison Using Stacked Data

A Breeze through SAS options to Enter a Zero-filled row Kajal Tahiliani, ICON Clinical Research, Warrington, PA

SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD

Displaying Multiple Graphs to Quickly Assess Patient Data Trends

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

Paper HOW-06. Tricia Aanderud, And Data Inc, Raleigh, NC

Arthur L. Carpenter California Occidental Consultants, Oceanside, California

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System

Tales from the Help Desk 6: Solutions to Common SAS Tasks

Give me EVERYTHING! A macro to combine the CONTENTS procedure output and formats. Lynn Mullins, PPD, Cincinnati, Ohio

Poster Frequencies of a Multiple Mention Question

ABSTRACT: INTRODUCTION: WEB CRAWLER OVERVIEW: METHOD 1: WEB CRAWLER IN SAS DATA STEP CODE. Paper CC-17

Creating Code writing algorithms for producing n-lagged variables. Matt Bates, J.P. Morgan Chase, Columbus, OH

Using SAS Macros to Extract P-values from PROC FREQ

One Project, Two Teams: The Unblind Leading the Blind

DATA Step Debugger APPENDIX 3

Chapter 6: Modifying and Combining Data Sets

Gary L. Katsanis, Blue Cross and Blue Shield of the Rochester Area, Rochester, NY

Exploring the SAS Macro Function %SYSFUNC

Tweaking your tables: Suppressing superfluous subtotals in PROC TABULATE

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

3N Validation to Validate PROC COMPARE Output

The DMSPLIT Procedure

Implementing external file processing with no record delimiter via a metadata-driven approach

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT)

A Macro to Keep Titles and Footnotes in One Place

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

USING DATA TO SET MACRO PARAMETERS

Getting it Done with PROC TABULATE

A Side of Hash for You To Dig Into

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

The Proc Transpose Cookbook

ABSTRACT DATA CLARIFCIATION FORM TRACKING ORACLE TABLE INTRODUCTION REVIEW QUALITY CHECKS

Procedures. Calls any BMDP program to analyze data in a SAS data set

Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization

Data Quality Review for Missing Values and Outliers

Pruning the SASLOG Digging into the Roots of NOTEs, WARNINGs, and ERRORs

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG

Get SAS sy with PROC SQL Amie Bissonett, Pharmanet/i3, Minneapolis, MN

The Dataset Diet How to transform short and fat into long and thin

Ranking Between the Lines

Unlock SAS Code Automation with the Power of Macros

Statistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment

Useful Tips When Deploying SAS Code in a Production Environment

Posters. Workarounds for SASWare Ballot Items Jack Hamilton, First Health, West Sacramento, California USA. Paper

How to Create Data-Driven Lists

PharmaSUG China. Systematically Reordering Axis Major Tick Values in SAS Graph Brian Shen, PPDI, ShangHai

The Basics of PROC FCMP. Dachao Liu Northwestern Universtiy Chicago

Amie Bissonett, inventiv Health Clinical, Minneapolis, MN

Foundations and Fundamentals. SAS System Options: The True Heroes of Macro Debugging Kevin Russell and Russ Tyndall, SAS Institute Inc.

SAS Model Manager 15.1: Quick Start Tutorial

Divide and Conquer Writing Parallel SAS Code to Speed Up Your SAS Program

KEYWORDS Metadata, macro language, CALL EXECUTE, %NRSTR, %TSLIT

Tired of CALL EXECUTE? Try DOSUBL

Matt Downs and Heidi Christ-Schmidt Statistics Collaborative, Inc., Washington, D.C.

A SAS Macro to Generate Caterpillar Plots. Guochen Song, i3 Statprobe, Cary, NC

Leveraging SAS Visualization Technologies to Increase the Global Competency of the U.S. Workforce

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

SAS Graphs in Small Multiples Andrea Wainwright-Zimmerman, Capital One, Richmond, VA

Essential ODS Techniques for Creating Reports in PDF Patrick Thornton, SRI International, Menlo Park, CA

Exporting Variable Labels as Column Headers in Excel using SAS Chaitanya Chowdagam, MaxisIT Inc., Metuchen, NJ

Correcting for natural time lag bias in non-participants in pre-post intervention evaluation studies

DBLOAD Procedure Reference

Paper An Automated Reporting Macro to Create Cell Index An Enhanced Revisit. Shi-Tao Yeh, GlaxoSmithKline, King of Prussia, PA

SAS Drug Development Program Portability

Automating Preliminary Data Cleaning in SAS

Hidden in plain sight: my top ten underpublicized enhancements in SAS Versions 9.2 and 9.3

A Few Quick and Efficient Ways to Compare Data

Choosing the Right Procedure

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

Storing and Reusing Macros

Preserving your SAS Environment in a Non-Persistent World. A Detailed Guide to PROC PRESENV. Steven Gross, Wells Fargo, Irving, TX

The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data

How to Keep Multiple Formats in One Variable after Transpose Mindy Wang

Introduction. Getting Started with the Macro Facility CHAPTER 1

Creating Zillions of Labels (and Other Documents) the Easy Way with ODS and Microsoft Word

Uncommon Techniques for Common Variables

Transcription:

Paper 045-29 A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN ABSTRACT: PROC MEANS analyzes datasets according to the variables listed in its Class statement. Its computed _TYPE_ variable can be used to determine which class variables were used for the analysis variable calculation. It can be very difficult to determine by inspection of the _TYPE_ variable which class variables were used in a calculation of any given row. The %TypeFormat macro takes the CLASS variable list and creates a format that associates the values of the _TYPE_ variable with a string listing the variables used in the calculation separated by the '*' character. INDTRODUCTION: The purpose of this paper is to present a macro that creates a format which formats the _TYPE_ variable in a PROC MEANS output of SAS from a number into a descriptive string. PROBLEMS READING THE _TYPE_ VARIABLE Consider the following scenario: Every day, a PROC MEANS with five class variables is run on a dataset, and the result is appended to a base of daily PROC MEANS outputs. An analyst is asked to run a report on the base for a certain three class variable combination. How does the analyst retrieve the correct data for the report? PROC MEANS provides a solution to this problem via the _TYPE_ variable. A group of rows with identical _TYPE_ values indicates that the same variables were used in calculating the analysis variables, and each row with this _TYPE_ value represents a different combination of the level of the variables. Rows with different _TYPE_ variables indicate that different combinations of variables were used. While this is a mathematically efficient way to categorize the analysis, it is a difficult value to use in practice. In order to interpret the value of the _TYPE_ variable, the analyst must be familiar with binary mathematics. Directions on how to interpret the meaning of the _TYPE_ variable may be found in the SAS9 Language Reference. The analyst must be able to convert the _TYPE_ value into binary, then use that result to pick out the relevant analysis variables from the class statement. A solution to this problem is to use the %TypeFormat macro (code included), which associates the _TYPE_ variable value to a string that lists the classification variables used in a particular calculation. For example, consider the following output of a simple fictitious dataset that has the location, home directory, and HTTP response code of each request to a server: Figure 1: PROC MEANS and output No _TYPE_ formatting Code: PROC MEANS data=work missing noprint; class Location Directory ResponseCode; output out=temp; run; PROC PRINT data=temp; Title '_TYPE_ WITHOUT formatting'; run; 1

Output: _TYPE_ WITHOUT formatting Response Obs _TYPE_ Location Directory Code _FREQ_ 1 0 30 2 1 1 3 1 200 11 4 1 404 18 5 2 /Downloads 12 6 2 /Graphics 18 7 3 /Downloads 1 8 3 /Downloads 200 6 9 3 /Downloads 404 5 10 3 /Graphics 200 5 11 3 /Graphics 404 13 12 4 SiteA 13 13 4 SiteB 17 14 5 SiteA 1 15 5 SiteA 200 4 16 5 SiteA 404 8 17 5 SiteB 200 7 18 5 SiteB 404 10 19 6 SiteA /Downloads 5 20 6 SiteA /Graphics 8 21 6 SiteB /Downloads 7 22 6 SiteB /Graphics 10 23 7 SiteA /Downloads 1 24 7 SiteA /Downloads 200 2 25 7 SiteA /Downloads 404 2 26 7 SiteA /Graphics 200 2 27 7 SiteA /Graphics 404 6 28 7 SiteB /Downloads 200 4 29 7 SiteB /Downloads 404 3 30 7 SiteB /Graphics 200 3 31 7 SiteB /Graphics 404 7 2

In this output, the _TYPE_=0 value identifies the summary with no classification variables, the _TYPE_=1 values identifies the summary by the ResponseCode variable, _TYPE_=2 identifies the summary by the Directory variable, _TYPE_=3 identifies the summary by Directory and ResponseCode variables, and so on through _TYPE_=7. Notice how difficult it can be to determine which CLASS variables were used for analysis in each row, especially if there are missing values, as is the case with one of the Response codes. The %TypeFormat macro creates a format that associates values of the _TYPE_ variable with a string of the variable names used in the analysis separated by a '*' character (see Appendix A for the macro code). In order to do this, the %TypeFormat must be called with two arguments: one being the name of the format that will be created (formatname=), and the other being the list of variables in the Class statement of the PROC MEANS in exactly the same order as it appears in the PROC MEANS statement (var=): Figure 2: PROC MEANS and output _TYPE_ formatting. Code: %TypeFormat(formatname=testtyp,var=Location Directory ResponseCode); PROC PRINT data=temp; var _type_ Location Directory ResponseCode _freq_; format _type_ testtyp.; title '_TYPE_ WITH formatting'; run; Output: _TYPE_ WITH formatting Response Obs _TYPE_ Location Directory Code _FREQ_ 1 30 2 ResponseCode 1 3 ResponseCode 200 11 4 ResponseCode 404 18 5 Directory /Downloads 12 6 Directory /Graphics 18 7 Directory*ResponseCode /Downloads 1 8 Directory*ResponseCode /Downloads 200 6 9 Directory*ResponseCode /Downloads 404 5 10 Directory*ResponseCode /Graphics 200 5 11 Directory*ResponseCode /Graphics 404 13 12 Location SiteA 13 13 Location SiteB 17 14 Location*ResponseCode SiteA 1 15 Location*ResponseCode SiteA 200 4 16 Location*ResponseCode SiteA 404 8 17 Location*ResponseCode SiteB 200 7 18 Location*ResponseCode SiteB 404 10 19 Location*Directory SiteA /Downloads 5 20 Location*Directory SiteA /Graphics 8 21 Location*Directory SiteB /Downloads 7 22 Location*Directory SiteB /Graphics 10 23 Location*Directory*ResponseCode SiteA /Downloads 1 24 Location*Directory*ResponseCode SiteA /Downloads 200 2 25 Location*Directory*ResponseCode SiteA /Downloads 404 2 26 Location*Directory*ResponseCode SiteA /Graphics 200 2 27 Location*Directory*ResponseCode SiteA /Graphics 404 6 28 Location*Directory*ResponseCode SiteB /Downloads 200 4 29 Location*Directory*ResponseCode SiteB /Downloads 404 3 30 Location*Directory*ResponseCode SiteB /Graphics 200 3 31 Location*Directory*ResponseCode SiteB /Graphics 404 7 3

CONCLUSION Notice how much easier the Figure 2 format is to read than Figure 1. The variables that are involved in the calculation are now clearly identified and the levels of these variables can now be read easily from the appropriate columns. This also makes identifying missing data easier, as the CLASS variable used is listed even if its data value is missing. REFERENCES SAS Institue Inc. (2002), SAS9 Language Reference: Dictionary, Volumes 1 and 2. Cary, NC: SAS Institute Inc. SAS Institue Inc. (2002), SAS9 Macro Language: Reference. Cary, NC: SAS Institute Inc. CONTACT INFORMATION Matt Pettis Thomson West Group 610 Opperman Dr. Eagan, MN 55417 +1 (651) 848-3976 Matt.pettis@thomson.com SAS and all other SAS Institute Inc. product or service names are registered trademarks of trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 4

Appendix A: %TypeFormat SAS code: %macro TypeFormat(formatname=typefmt,var=x1 x2 x3 x4); /* Count the number of variables, put into var_count */ %local var_count; %let var_count = 1; %do %until (%scan(&var,&var_count) eq); %let var_count = %eval(&var_count+1); % %let var_count = %eval(&var_count-1); /* Assign each variable name to an indexed macro &&var_val&i */ %local i; %let i = %eval(&var_count); %do %until (&i <= 0); %local var_val&i; %let var_val&i = %scan(&var,&i); %let i = %eval(&i-1); % /* Create temp dataset to use as format */ data _tmp; keep label start fmtname type; retain fmtname "&formatname" type 'n'; length label $ 256 sep $ 1; sep = '*'; * Separator character; /* Loop through the type combinations */ type_iter = 2**(&var_count) - 1; * Loop through the types; var_iter = &var_count; * Loop over the binary digits; do start = 0 to type_iter; * Type iteration loop; i_tmp = start; label = ''; do j = var_iter to 1 by (-1); * Binary digit loop; bin_digit = int(i_tmp/(2**(j-1))); if bin_digit = 1 then do; * Get appropriate variable name; x = symget('var_val' left(trim(put(&var_count - j + 1,3.)))); * Add the separator to the string; x = left(trim(x)) sep; * Append selected variable name to label string; newlabel = trim(label) x; * Reassign label as newlabel; label = newlabel; * Decrement i_tmp if bin_digit is in types binary representation; i_tmp = i_tmp - 2**(j-1); label = left(label); * justify label test to left; len = length(label); * Take off separator that was appended to label = substr(label,1,len-1); output; stop; run; * create the format from _tmp dataset; proc format cntlin=_tmp; run; * Delete _tmp dataset to clean up work library; proc datasets; delete _tmp; quit; run; %mend TypeFormat; 5