Setting the Percentage in PROC TABULATE

Similar documents
Writing Reports with the

Introducing a Colorful Proc Tabulate Ben Cochran, The Bedford Group, Raleigh, NC

It s Proc Tabulate Jim, but not as we know it!

Getting it Done with PROC TABULATE

Going Beyond Proc Tabulate Jim Edgington, LabOne, Inc., Lenexa, KS Carole Lindblade, LabOne, Inc., Lenexa, KS

Using PROC TABULATE and ODS Style Options to Make Really Great Tables Wendi L. Wright, CTB / McGraw-Hill, Harrisburg, PA

Art Carpenter California Occidental Consultants

Tabulating Patients, Admissions and Length-of-Stay By Dx Category, Fiscal Year, County and Age Group

SAS Macros for Grouping Count and Its Application to Enhance Your Reports

Run your reports through that last loop to standardize the presentation attributes

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

DSCI 325: Handout 10 Summarizing Numerical and Categorical Data in SAS Spring 2017

Using PROC SQL to Generate Shift Tables More Efficiently

%ANYTL: A Versatile Table/Listing Macro

Getting Up to Speed with PROC REPORT Kimberly LeBouton, K.J.L. Computing, Rossmoor, CA

ODS for PRINT, REPORT and TABULATE

Chapter 6 Creating Reports. Chapter Table of Contents

Christopher Toppe, Ph.D. Computer Sciences Corporation

Square Peg, Square Hole Getting Tables to Fit on Slides in the ODS Destination for PowerPoint

ODS in an Instant! Bernadette H. Johnson, The Blaze Group, Inc., Raleigh, NC

Proc Tabulate: Extending This Powerful Tool Beyond Its Limitations

A Macro to replace PROC REPORT!?

Tweaking your tables: Suppressing superfluous subtotals in PROC TABULATE

PharmaSUG China 2018 Paper AD-62

An Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step Mike Zdeb, FSL, University at Albany School of Public Health, Rensselaer, NY

There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA

Chapters 18, 19, 20 Solutions. Page 1 of 14. Demographics from COLLEGE Data Set

SAS Online Training: Course contents: Agenda:

Paper S Data Presentation 101: An Analyst s Perspective

PROC MEANS for Disaggregating Statistics in SAS : One Input Data Set and One Output Data Set with Everything You Need

Basic Concepts #6: Introduction to Report Writing

Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide

The REPORT Procedure CHAPTER 32

Multiple Graphical and Tabular Reports on One Page, Multiple Ways to Do It Niraj J Pandya, CT, USA

Making an RTF file Out of a Text File, With SAS Paper CC13

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX

PDF Accessibility: How SAS 9.4M5 Enables Automatic Production of Accessible PDF Files

Quality Control of Clinical Data Listings with Proc Compare

Producing Summary Tables in SAS Enterprise Guide

Moving Data and Results Between SAS and Excel. Harry Droogendyk Stratia Consulting Inc.

Getting Started with the SGPLOT Procedure

Creating output datasets using SQL (Structured Query Language) only Andrii Stakhniv, Experis Clinical, Ukraine

A Practical Introduction to SAS Data Integration Studio

A Breeze through SAS options to Enter a Zero-filled row Kajal Tahiliani, ICON Clinical Research, Warrington, PA

ABSTRACT INTRODUCTION TRICK 1: CHOOSE THE BEST METHOD TO CREATE MACRO VARIABLES

SAS Example A10. Output Delivery System (ODS) Sample Data Set sales.txt. Examples of currently available ODS destinations: Mervyn Marasinghe

An Introduction to PROC REPORT

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

I AlB 1 C 1 D ~~~ I I ; -j-----; ;--i--;--j- ;- j--; AlB

A SAS Macro for Generating Informative Cumulative/Point-wise Bar Charts

Using Proc Freq for Manageable Data Summarization

Professional outputs with ODS LATEX

Example1D.1.sas. * Procedures : ; * 1. print to show the dataset. ;

Making a SYLK file from SAS data. Another way to Excel using SAS

Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA

ODS DOCUMENT, a practical example. Ruurd Bennink, OCS Consulting B.V., s-hertogenbosch, the Netherlands

Lab #3. Viewing Data in SAS. Tables in SAS. 171:161: Introduction to Biostatistics Breheny

Turn In: A copy of the first 50 lines or so of the converted text file.

Analysis of Complex Survey Data with SAS

Let s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic

Exam Questions A00-281

ABSTRACT INTRODUCTION THE ODS TAGSET FACILITY

Data Quality Control: Using High Performance Binning to Prevent Information Loss

Introduction to SAS Procedures SAS Basics III. Susan J. Slaughter, Avocet Solutions

Print the Proc Report and Have It on My Desktop in the Morning! James T. Kardasis, J.T.K. & Associates, Skokie, IL

A Fully Automated Approach to Concatenate RTF outputs and Create TOC Zhiping Yan, Covance, Beijing, China Lugang Xie, Merck, Princeton, US

SAS Application to Automate a Comprehensive Review of DEFINE and All of its Components

CMISS the SAS Function You May Have Been MISSING Mira Shapiro, Analytic Designers LLC, Bethesda, MD

Exporting Variable Labels as Column Headers in Excel using SAS Chaitanya Chowdagam, MaxisIT Inc., Metuchen, NJ

How to Go From SAS Data Sets to DATA NULL or WordPerfect Tables Anne Horney, Cooperative Studies Program Coordinating Center, Perry Point, Maryland

ABSTRACT INTRODUCTION PROBLEM: TOO MUCH INFORMATION? math nrt scr. ID School Grade Gender Ethnicity read nrt scr

Uncommon Techniques for Common Variables

Descriptive Statistics: Using Indicator Variables

Multiple Facts about Multilabel Formats

Using PROC REPORT to Cross-Tabulate Multiple Response Items Patrick Thornton, SRI International, Menlo Park, CA

Creating Population Tree Charts (Using SAS/GRAPH Software) Robert E. Allison, Jr. and Dr. Moon W. Suh College of Textiles, N. C.

Same Data Different Attributes: Cloning Issues with Data Sets Brian Varney, Experis Business Analytics, Portage, MI

Introduction to SAS Procedures SAS Basics III. Susan J. Slaughter, Avocet Solutions

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

Using a Fillable PDF together with SAS for Questionnaire Data Donald Evans, US Department of the Treasury

Introduction to Statistical Graphics Procedures

PivotTables & Charts for Health

Creating Complex Reports Cynthia L. Zender, SAS Institute, Inc., Cary, NC

Part 1. Introduction. Chapter 1 Why Use ODS? 3. Chapter 2 ODS Basics 13

ABSTRACT INTRODUCTION WHERE TO START? 1. DATA CHECK FOR CONSISTENCIES

Six Cool Things You Can Do In Display Manager Jenine Milum, Charlotte, NC Wachovia Bank

Creating Forest Plots Using SAS/GRAPH and the Annotate Facility

TLF Management Tools: SAS programs to help in managing large number of TLFs. Eduard Joseph Siquioco, PPD, Manila, Philippines

Super boost data transpose puzzle

A Practical and Efficient Approach in Generating AE (Adverse Events) Tables within a Clinical Study Environment

PharmaSUG China. model to include all potential prognostic factors and exploratory variables, 2) select covariates which are significant at

Anyone Can Learn PROC TABULATE, v2.0

Data Quality Control for Big Data: Preventing Information Loss With High Performance Binning

SC-15. An Annotated Guide: Using Proc Tabulate And Proc Summary to Validate SAS Code Russ Lavery, Contractor, Ardmore, PA

Indenting with Style

Using Templates Created by the SAS/STAT Procedures

Let Hash SUMINC Count For You Joseph Hinson, Accenture Life Sciences, Berwyn, PA, USA

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

Using SAS Macros to Extract P-values from PROC FREQ

Introduction to SAS: General

Transcription:

SESUG Paper 193-2017 Setting the Percentage in PROC TABULATE David Franklin, QuintilesIMS, Cambridge, MA ABSTRACT PROC TABULATE is a very powerful procedure which can do statistics and frequency counts very efficiently, but it also it has the capability of calculating percentages on many levels for a category. This paper looks at the automatic percentage calculations that are provided, and then delves into how a user can specify the denominator for your custom percentage. INTRODUCTION PROC TABULATE tends to be one of the procedures that many programmers forget about, too often overlooked by a series of PROC FREQs, PROC MEANs and finally PROC REPORT. The one feature of PROC TABULATE is the ability to do both frequency and statistical calculations on the data, and report in a tabular form, a bit clunky at times, but is still an option that every SAS programmer should have under their belt. This paper takes a quick look at the basics of the PROC TABULATE then quickly introduces percentages, taking a look at the automatics, then setting custom percentage calculations, then finish off with a quick look at a trick where a Missing category may be present, but the percentage should only be on non-missing categories. It must be noted that the examples in this paper use the ODS LISTING output destination, but could be easily converted into nice looking output using ODS PDF, ODS RTF or ODS HTML output destinations that allow for substantial formatting and use of proportional fonts. THE BASICS One of the advantages of using PROC TABULATE is that it will do frequency and statistical analysis, as well a produce a readable output. The minimum syntax needed to produce a table is PROC TABULATE DATA=dataset; CLASS categorical_variable(s); VAR numeric_variable(s); TABLES <<<page dimension,>row dimension,>column dimension>; When looking at using PROC TABULATE for a table, take a look at first the also known as the classification variables -- these are the ones that are going to be placed in the CLASS statement. Second, decide on any analysis variables from which statistics, not frequency counts, will be produced -- these are stated in the VAR statement. The last step is defining the layout in the TABLES statement which defines where the CLASS and VAR variables go, what statistics are to be calculated on VAR variables, and finally formatting and labels. The best way to see show the CLASS, VAR and TABLES statements come together is see an example. 1

PROC TABULATE DATA=sashelp.cars FORMCHAR=" ---- + ---" F=6.; CLASS make type; TABLE make,type; WHERE origin='usa' AND type IN('SUV','Sedan'); In the example at line one is the PROC TABULATE statement with a FORMCHAR option that sets the characters for the borders of the output, and a F= option that sets the default output column format for the result columns. The second line lists the variables for the categorical analysis, in this case MAKE and TYPE. The third line is the TABLE statement which sets up the dimensions to the table as well as calls for statistics, formatting and labels -- in the example we are just setting up a simple cross tabulation count with the variable MAKE as the ROW dimension, and the variable MAKE as the column dimension. The dataset contains numerous other information but is being restricted by the WHERE statement selecting on two types of cars coming from the USA. The output is given below: As stated previously, this is just a simple frequency count; we are going to step this up a notch by looking at some automatically generated percentages. THE AUTOMATICS SAS by default will calculate a number of percentages automatically. These are: REPPCTN and REPPCTSUM -- percentage of the value in a single table cell in relation to the total of the values in the report. COLPCTN and COLPCTSUM -- percentage of the value in a single table cell in relation to the total of the values in the column. 2

ROWPCTN and ROWPCTSUM -- percentage of the value in a single table cell in relation to the total of the values in the row. PAGEPCTN and PAGEPCTSUM -- percentage of the value in a single table cell in relation to the total of the values in the page. In each case the PCTN variables are in relation to the proportion of total counts, while the PCTSUM variables relate to the proportion of the total sum. Our next example will take the previous example and add a percentage based on each row total, and use an ALL option in the TABLE statement to add the type column frequency results together, not outputting the percentage in the ALL group (which will add up to 100% so is redundant in this case). PROC TABULATE DATA=sashelp.cars FORMCHAR=" ---- + ---" F=6.; CLASS make type; TABLE make, (type*(n ROWPCTN='%'*F=6.1)) ALL; WHERE origin='usa' AND type IN('SUV','Sedan'); In the row dimension, things remain the same. However in the column dimension things that changed -- the TYPE variable is now broken into counts (n) and row percentages (ROWPCTN), and for the percentage we are setting the column label to read % and the default format is 6.1. The output will look similar to that given below: The next example uses the COLPCTN variable by setting the percentage by the column total. Lets look at the code first, but this time the CLASS dataset from the SASHELP directory is being used: 3

DATA _class; SET sashelp.class; agegrp=age; LABEL agegrp='age Group'; PROC TABULATE DATA=_class FORMHAR=" ---- + ---" F=6.; CLASS sex agegrp; VAR age; TABLES (age*(n MEAN*F=6.1 MIN MAX)) agegrp*(n COLPCTN='%'*F=6.1),sex /RTS=30; The datastep just takes the value of AGE and copies it to a new variable AGEGRP, the reason being that in this example a basic statistical analysis is done on age along with a frequency analysis, by SEX, in the singe PROC TABULATE call -- this is often call stacking. In the CLASS statement the variables SEX and AGEGRP are being treated as frequency class variables, while AGE is being set as numerical type variable. The first line of the TABLES statement sets up an analysis in the row dimension for an AGE analysis computing the N, MEAN, MIN and MAX values, setting a format for the MEAN of 6.1 (the default for the table is 6.0). The second line of the TABLE statement sets up the frequency analysis calculating the percentage based on the column total, with a format of 6.1. The variable SEX is used as the column dimension, and the RTS option sets aside a width of 30 characters to the field that will contain the labels. Output from the code is shown below: One problem with the example above is that the percentage is based on all categories, meaning that if a Missing category exists in your output table then it is going to be used in the denominator calculation. Do 4

not fret, there is a way to trick PROC TABULATE into not including the Missing category, but we are going to have to create our own new dataset: DATA _class; LENGTH subnum sex 8; INFILE CARDS; INPUT subnum trt sex age @@; sexk=n(sex); *Set flag, 1=sex value present; LABEL trt='treatment' sex='gender' age='age'; CARDS; 1 2 1 25 2 1 2 28 3 1 1 46 4 2 1 24 5 1 1 63 6 2 1 47 7 1 2 56 8 2 1 23 9 2 1 43 10 1. 55 11 2 1 53 12 2 1 25 13 2 1 50 14 2 2 44 15 2 1 48 16 2 1 64 17 2 1 59 18 1 1 59 19 1 2 47 20 2 1 65 ; Now we have the data, notice that at the end of the second line, the record has a TRT value, no SEX value, and an AGE value. There also the line sexy=n(sex); which sets a flag 1 if the value fox SEX is present, else it will set to 0. This is used in the calculation of the percentage which will be seen shortly. The next step is set our format, as shown below: PROC FORMAT; VALUE trtf 1='Arm A' 2='Arm B'; VALUE sexf.='missing' 1='Male' 2='Female'; These set up decoded values for the coded variables -- note that for the format SEXF a missing values is set to MISSING. Next is the PROC TABULATE call, as shown below: PROC TABULATE DATA=_class MISSING FORMCHAR=" ---- + ---" F=6.; CLASS trt sex; VAR age sexk; TABLES age*(n MEAN*f=6.1 MIN MAX) sex*sexk='group'*(n COLPCTSUM='%'*F=6.1),trt ALL/RTS=30; FORMAT sex sexf. trt trtf.; The SEXK variable, as we defined above, is used as a numeric, and with the COLPCTSUM in the TABLES statement, only does the percentage calculation based on the sum of the SEX frequencies where SEXK is equal to 1, i.e. Male or Female. See the output below for the result of the PROC TABULATE call: 5

As a check, for Treatment Arm A, three males out of six males and females is 3*100/6=50%. SETTING YOUR OWN DENOMINATOR Above we were using the automatics, but it is possible to set your own denominator for the percentage. There are two formats for this approach: PCTN< var > -- Frequency counts, var being the variable to use as the denominator PCTSUM< var > -- Sums of data, var being the variable to use as the denominator This approach to the denominator is very useful if you have stacked tables as often the value to be used is not the column total or row total. The following example shows the case, going back to the CARS dataset that was earlier, where the denominator is the MAKE by TYPE: PROC TABULATE DATA=sashelp.cars FORMCHAR=" ---- + ---" F=6.; CLASS make type; TABLE make, (type*(n PCTN<make>='%'*F=6.1)) ALL; WHERE origin='usa' AND type IN('SUV','Sedan'); Note that the denominator in this case is the total cars in the MAKE category by type, thus the percentage total for the SUV type will be 100% as is the total for the Sedan type. In a stacked TABLES statement call, there may be different denominators for each row or column group. See below for the output from the PROC TABULATE call. 6

CONCLUSION Adding a percentage to the PROC TABULATE call is not as daunting as some would make the SAS programmer below. The use of the so called automatics assists in getting that percentage out to the report correctly, and even will allow for Missing categories to be counted without being used in the percentage. Also touched on was the ability to set your own denominator for the calculation of the percentage. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: David Franklin QuintilesIMS David.Franklin1@QuintilesIMS.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 7