Multiple Facts about Multilabel Formats

Similar documents
Format-o-matic: Using Formats To Merge Data From Multiple Sources

PROC FORMAT. CMS SAS User Group Conference October 31, 2007 Dan Waldo

Quality Control of Clinical Data Listings with Proc Compare

Epidemiology Principles of Biostatistics Chapter 3. Introduction to SAS. John Koval

Formats, Informats and How to Program with Them Ian Whitlock, Westat, Rockville, MD

Understanding and Applying Multilabel Formats

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA

SAS Training Spring 2006

100 THE NUANCES OF COMBINING MULTIPLE HOSPITAL DATA

The Power of Combining Data with the PROC SQL

Example1D.1.sas. * Procedures : ; * 1. print to show the dataset. ;

Omitting Records with Invalid Default Values

Let SAS Write and Execute Your Data-Driven SAS Code

ECLT 5810 SAS Programming - Introduction

Data Manipulation with SQL Mara Werner, HHS/OIG, Chicago, IL

ABSTRACT INTRODUCTION PROBLEM: TOO MUCH INFORMATION? math nrt scr. ID School Grade Gender Ethnicity read nrt scr

Beyond FORMAT Basics Mike Zdeb, School of Public Health, Rensselaer, NY

How to extract suicide statistics by country from the. WHO Mortality Database Online Tool

Statistics, Data Analysis & Econometrics

Beginner Beware: Hidden Hazards in SAS Coding

Paper PS05_05 Using SAS to Process Repeated Measures Data Terry Fain, RAND Corporation Cyndie Gareleck, RAND Corporation

Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA

PROC MEANS for Disaggregating Statistics in SAS : One Input Data Set and One Output Data Set with Everything You Need

It s Proc Tabulate Jim, but not as we know it!

PROC SUMMARY AND PROC FORMAT: A WINNING COMBINATION

Tweaking your tables: Suppressing superfluous subtotals in PROC TABULATE

%Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables

RRFSS 2012: Cell Phone Use and Texting While Driving

Analysis of Complex Survey Data with SAS

Using PROC REPORT to Cross-Tabulate Multiple Response Items Patrick Thornton, SRI International, Menlo Park, CA

Going Under the Hood: How Does the Macro Processor Really Work?

Checking for Duplicates Wendi L. Wright

OLAP Drill-through Table Considerations

PROC FORMAT Jack Shoemaker Real Decisions Corporation

Going Beyond Proc Tabulate Jim Edgington, LabOne, Inc., Lenexa, KS Carole Lindblade, LabOne, Inc., Lenexa, KS

SAS/STAT 13.1 User s Guide. The NESTED Procedure

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.

using and Understanding Formats

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board

Bruce Gilsen, Federal Reserve Board

29 Shades of Missing

Introduction to Stata Toy Program #1 Basic Descriptives

Using a HASH Table to Reference Variables in an Array by Name. John Henry King, Hopper, Arkansas

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC

I AlB 1 C 1 D ~~~ I I ; -j-----; ;--i--;--j- ;- j--; AlB

Cleaning up your SAS log: Note Messages

The Essential Meaning of PROC MEANS: A Beginner's Guide to Summarizing Data Using SAS Software

Setting the Percentage in PROC TABULATE

EXAM - A Clinical Trials Programming Using SAS 9 Accelerated Version. Buy Full Product.

SAS PROGRAMMING AND APPLICATIONS (STAT 5110/6110): FALL 2015 Module 2

Using Proc Freq for Manageable Data Summarization

Anyone Can Learn PROC TABULATE, v2.0

2. Don t forget semicolons and RUN statements The two most common programming errors.

Making the most of SAS Jobs in LSAF

Exam Questions A00-281

Stata v 12 Illustration. First Session

Using SAS to Analyze CYP-C Data: Introduction to Procedures. Overview

Getting Up to Speed with PROC REPORT Kimberly LeBouton, K.J.L. Computing, Rossmoor, CA

Table Lookups: Getting Started With Proc Format

A SAS Solution to Create a Weekly Format Susan Bakken, Aimia, Plymouth, MN

ERROR: ERROR: ERROR:

STAT:5400 Computing in Statistics

Mandatory fields in Investor information file creation

An Introduction to SAS/FSP Software Terry Fain, RAND, Santa Monica, California Cyndie Gareleck, RAND, Santa Monica, California

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX

I Launching and Exiting Stata. Stata will ask you if you would like to check for updates. Update now or later, your choice.

A Practical Introduction to SAS Data Integration Studio

Arena: Membership Lists Foundations (Hands On)

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

ssh tap sas913 sas

Paper PO06. Building Dynamic Informats and Formats

Multiple Graphical and Tabular Reports on One Page, Multiple Ways to Do It Niraj J Pandya, CT, USA

%ANYTL: A Versatile Table/Listing Macro

DSCI 325: Handout 10 Summarizing Numerical and Categorical Data in SAS Spring 2017

The Proc Transpose Cookbook

The NESTED Procedure (Chapter)

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

KEYWORDS Metadata, macro language, CALL EXECUTE, %NRSTR, %TSLIT

A Side of Hash for You To Dig Into

Producing Summary Tables in SAS Enterprise Guide

SPSS - Beginnings Data, Descriptive Statistics, Select cases, recode Structure SPSS has 3 different fields (windows) 1. Data window (double window). O

New Vs. Old Under the Hood with Procs CONTENTS and COMPARE Patricia Hettinger, SAS Professional, Oakbrook Terrace, IL

Selection Control Structure CSC128: FUNDAMENTALS OF COMPUTER PROBLEM SOLVING

A Cross-national Comparison Using Stacked Data

ABSTRACT DATA CLARIFCIATION FORM TRACKING ORACLE TABLE INTRODUCTION REVIEW QUALITY CHECKS

The GREMOVE Procedure

Paper # Jazz it up a Little with Formats. Brian Bee, The Knowledge Warehouse Ltd

An Animated Guide: Proc Transpose

Lab #1: Introduction to Basic SAS Operations

Automating the Production of Formatted Item Frequencies using Survey Metadata

Chapter 6 Creating Reports. Chapter Table of Contents

CMU MSP : SAS FORMATs and INFORMATs Howard Seltman Nov. 7+12, 2018

PROC REPORT Basics: Getting Started with the Primary Statements

Tools to Facilitate the Creation of Pooled Clinical Trials Databases

Using PROC PLAN for Randomization Assignments

25 Working with categorical data and factor variables

3. Almost always use system options options compress =yes nocenter; /* mostly use */ options ps=9999 ls=200;

Introduction to SAS Mike Zdeb ( , #61

Transcription:

Multiple Facts about Multilabel Formats Gwen D. Babcock, ew York State Department of Health, Troy, Y ABSTRACT PROC FORMAT is a powerful procedure which allows the viewing and summarizing of data in various ways without creating new variables in the data step. The addition of the MULTILABEL option in SAS 8 has made PROC FORMAT even more powerful by allowing formats to have duplicate and/or overlapping ranges. This paper will give examples of the use of MULTILABEL formats with the /mlf option in PROC MEAS, PROC SUMMARY, and PROC TABULATE. In addition, this paper addresses what happens when MULTILABEL formats are used in other procedures or without the /mlf option. This paper also shows how MULTILABEL formats, like other formats, can be created from an exiting dataset. MULTILABEL formats are a valuable addition to any SAS programmer s repertoire. ITRODUCTIO Many times, the way the data are stored in a dataset is not the way you wish to display it. In this case, PROC FORMAT can be used to group and recode variables. Using proc format, you can label variable values. The labels are then displayed in procedure output instead of the underlying values. Traditional formats allow only one label per data value. In contrast, multilabel formats allow more than one label per data value. This allows for the creation of overlapping groups (such as children and teenagers) and total summaries for data analysis. The drawback of multilabel formats is that they can only be used with a limited number of procedures. Three procedures that it can be used with are PROC MEAS, PROC SUMMARY, and PROC TABULATE. This paper shows how to use multilabel formats, with special emphasis on frequency analysis, and what happens when you forget and use a multilabel format for ordinary analysis. The output is from SAS 9..2 running in Windows XP. HOW TO USE FORMATS AD MULTILABEL FORMATS This is an example of a format. The original data is either or 2, which represents Male and Female, respectively. data nesug; input sex $.; datalines; 2 2 ; value $sex ''='Male' '2'='Female'; proc freq data=nesug; table sex; format sex $sex.;

Here is the output. The s and 2 s have been changed to Male and Female. The FREQ Procedure Cumulative Cumulative sex Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Male 4 66.67 4 66.67 Female 2 33.33 6 00.00 Frequency Missing = What if we wanted to display both the number of males and females and the total number of people? We could use a multilabel format, like this: value $gender (multilabel) ''='Male' '2'='Female' '','2',' '='Total people'; But this format gives the exact same results as the previous one when we run the PROC FREQ. Why? Because PROC FREQ doesn t understand multilabel formats. Only PROC MEAS, PROC SUMMARY, and PROC TABU- LATE can use multilabel formats. We can use PROC MEAS to determine the frequencies we want: proc means data=nesug missing; class sex /mlf; *when all variables are character variables, proc means produces a simple count of observations; ote that in addition to using our new multilabel format, we must also use the MLF option in the CLASS statement. This option tells the procedure that the format for the class variable will be multilabel. Here is the output: The MEAS Procedure sex Obs ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Female 2 Male 4 Total people 7 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ PROC SUMMARY is very similar to PROC MEAS. However, PROC MEAS will print output by default, while PROC SUMMARY will not. If you omit the VAR statement, PROC MEAS analyzes all numeric variables not included in other statements, while PROC SUMMARY produces a count. 2

Multilabel formats can also be used in PROC TABULATE. Here is some example code and it s output: proc tabulate data=nesug missing; class sex /mlf; /*the CLASS statement identifies character variables*/ /*a var statement would identify numeric variables, if we had any*/ table sex,n; ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ƒƒƒƒƒƒƒƒƒƒƒƒ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ sex ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Female 2.00 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ Male 4.00 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒ Total people 7.00 Šƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ ƒƒƒƒƒƒƒƒƒƒƒƒœ Like PROC MEAS, PROC TABULATE uses the MLF option in the CLASS statement to tell the procedure that the format for the class variable will be multilabel. TO ERR IS HUMA Multilabel formats can be used with PROC MEAS, PROC SUMMARY, and PROC TABULATE. All three of these procedures require you to use the MLF option to use multilabel formats. But what happens when we forget these three little letters? How does it affect our results? TO FORGIVE IS SAS Consider the following code, where we define a multilabel format and then fail to use the MLF option in PROC MEAS. The output follows: value $gender (multilabel) ''='Male' '2'='Female' '','2',' '='Total people'; proc means data=nesug; class sex; 3

The MEAS Procedure sex Obs ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Male 4 Female 2 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ It appears that SAS is forgiving; even though our totals are not displayed, we still get the correct values for males and females. I will now continue to explore to see if this is a general rule. Consider the following code, which is similar to the previous section, except that we change the order in the PROC FORMAT value statement. value $gender (multilabel) '','2',' '='Both sexes' ''='Male' '2'='Female'; proc means data=nesug missing; class sex; The MEAS Procedure sex Obs ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Both sexes 7 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Changing the order in the VALUE statement of PROC FORMAT gives completely different results. SAS now reports the total population rather than counting by sex. SAS appears to assign the first label it encounters in format to the data value, and ignores all subsequent labels. TO OT FORGIVE IS ALSO SAS Consider this code, in which we again change the order in the value statement of the PROC FORMAT procedure: value $gender (multilabel) ''='Male' '','2',' '='Both sexes' '2'='Female'; proc means data=nesug missing; class sex; 4

The MEAS Procedure sex Obs ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Both sexes 3 Male 4 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ In this case, we get incorrect results because we used a multilabel format without using the MLF option in the CLASS statement. SAS appears to first assign the label Male to the value, then it assigns the value Both Sexes to 2 and ; it can t use this again for, because already has a label. Similarly, once 2 is assigned the label Both Sexes, it cannot subsequently be assigned the label Female. Therefore, when writing multilable formats, one should consider whether these formats may be accidentally or intentionally used without the MLF option or in procedures that do not support multilabel formats. If so, consideration should be given to the order of the assignments in the VALUE statement so that the desired results (or at least not incorrect results) are produced. UMERIC MULTILABEL FORMATS USED WITHOUT /MLF SAS handles numeric multilabel formats differently than character multilabel formats. Consider the following code, which is designed to break up observations into overlapping age groups and provide counts in each age group: data nesug2; input age; datalines; 5 0 3 54 80 ; value age (multilabel) -4='Preschool' -8='Children' 9-20='Adults'; proc means data=nesug2 n; title "Results using the /MLF option"; class age /mlf; var age; format age age.; proc means data=nesug2 n; title "Results witout the /MLF option"; class age; var age; format age age.; title; 5

Results using the /MLF option The MEAS Procedure Analysis Variable : age age Obs Adults 2 2 Children 3 3 Preschool Results without the /MLF option The MEAS Procedure Analysis Variable : age age Obs Preschool Children 2 2 Adults 2 2 Without the /MLF option, the procedure gives incorrect results, since the preschoolers are not included in the Children category. Can we fix this by changing the order of our statements in PROC FORMAT? Consider the following code, where we put the Adults category first, followed by Children and then Preschool : /*does order matter?*/ value age (multilabel) 9-20='Adults' -8='Children' -4='Preschool'; proc means data=nesug2 n; class age; var age; format age age.; 6

Here are the results: The MEAS Procedure Analysis Variable : age age Obs Preschool Children 2 2 Adults 2 2 It appears that SAS sorts the format labels by value and then assigns them. So it doesn t matter what order you type the labels in, unless you use the OTSORTED options in PROC FORMAT. Consider this example: /*it's the internal order that's important*/ value age (multilabel notsorted) 9-20='Adults' -8='Children' -4='Preschool'; proc means data=nesug2 n; class age; var age; format age age.; The MEAS Procedure Analysis Variable : age age Obs Children 3 3 Adults 2 2 Although these results lack the category Preschoolers, the results that do display are correct, even though we did not use the /MLF option in PROC MEAS. The OTSORTED option in PROC FORMAT forces SAS to assign the values in the order we typed them, not in numerical sort order. MULTILABEL FORMATS WITH CTLI DATASETS Typing in long value statements can be tedious. Fortunately, SAS provides a way to create formats from datasets using the CTLI= option in the PROC FORMAT statement. These datasets are called input control datasets, and each one can contain one or more formats. Consider the following example: 7

data forformat; fmtname="$gend"; hlo="m"; format label $0.; start=""; label="male"; output; start="2"; label="female"; output; start=""; label="both sexes"; output; start="2"; label="both sexes"; output; start=" "; label="both sexes"; output; The essential parts of the format are present in the dataset. The start variable contains the values of the variable that are to be formatted. The label variable tells how the start value should be displayed by SAS when the format is used. The fmtname variable contains the name of the format. One input control dataset can contain multiple formats. These variables are the same for both ordinary and multilabel formats. In addition to these variables, the variable hlo is used to tell SAS that this is a multilabel format. To declare that this is a multilabel format, the hlo variable is set to M. Once the dataset is complete, it can be used with the cntlin option of PROC FORMAT to create the actual format. Then the format can be used as usual. Here is an example of how the previous dataset could be used: proc format cntlin=forformat; proc means data=nesug missing; class sex /mlf; format sex $gend.; COCLUSIOS Multilabel formats are a powerful way to control how data are organized and displayed for presentation. They can be typed in or created using input control datasets. However, multilabel formats also have the potential to provide incorrect results if used improperly. Therefore, SAS programmers would do well to arrange the VALUE statements in PROC format so that if they do forget to use the /mlf option or use a multilabel format in a procedure that does not support it, the results will still be correct, though incomplete. This provides an additional precaution against human error. ACKOWLEDGMETS Thanks to Thomas Talbot and Sanjaya Kumar for their thoughtful review of this paper. COTACT IFORMATIO Your comments and questions are valued and encouraged. Contact the author at: Gwen D. Babcock ew York State Department of Health 547 River St Troy Y 280-226 Work Phone: 58-402-7785 Fax: 58-402-7959 Email: gdb02@health.state.ny.us * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. 8