%Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables

Similar documents
Example. Section: PS 709 Examples of Calculations of Reduced Hours of Work Last Revised: February 2017 Last Reviewed: February 2017 Next Review:

Calendar Excel Template User Guide

CALENDAR OF FILING DEADLINES AND SEC HOLIDAYS

September 2015 Calendar This Excel calendar is blank & designed for easy use as a planner. Courtesy of WinCalendar.com

Scheduling. Scheduling Tasks At Creation Time CHAPTER

AIMMS Function Reference - Date Time Related Identifiers

Institute For Arts & Digital Sciences SHORT COURSES

The Demystification of a Great Deal of Files

Basic Device Management

Creating Macro Calls using Proc Freq

Calendar PPF Production Cycles Non-Production Activities and Events

ControlLogix/Studio 5000 Logix Designer Course Code Course Title Duration CCP143 Studio 5000 Logix Designer Level 3: Project Development 4.

YEAR 8 STUDENT ASSESSMENT PLANNER SEMESTER 1, 2018 TERM 1

name name C S M E S M E S M E Block S Block M C-Various October Sunday

Nortel Enterprise Reporting Quality Monitoring Meta-Model Guide

Epidemiology Principles of Biostatistics Chapter 3. Introduction to SAS. John Koval

TEMPLATE CALENDAR2015. facemediagroup.co.uk

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

Note: The enumerations range from 0 to (number_of_elements_in_enumeration 1).

Backup Exec Supplement

44 Tricks with the 4mat Procedure

View a Students Schedule Through Student Services Trigger:

ID-AL - SCHEDULER V2.x - Time-stamped & dated programming Scheduler - Manual SCHEDULER. V2.x MANUAL

Cambridge English Dates and Fees for 2018

HP Project and Portfolio Management Center

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

2016 Calendar of System Events and Moratoriums

INFORMATION TECHNOLOGY SPREADSHEETS. Part 1

PharmaSUG Paper TT11

MAP OF OUR REGION. About

SOUTH DAKOTA BOARD OF REGENTS. Board Work ******************************************************************************

CHIROPRACTIC MARKETING CENTER

General Instructions

BHARATI VIDYAPEETH`S INSTITUTE OF MANAGEMENT STUDIES AND RESEARCH NAVI MUMBAI ACADEMIC CALENDER JUNE MAY 2017

1 of 8 10/10/2018, 12:52 PM RM-01, 10/10/2018. * Required. 1. Agency Name: * 2. Fiscal year reported: * 3. Date: *

QI TALK TIME. Run Charts. Speaker: Dr Michael Carton. Connect Improve Innovate. 19 th Dec Building an Irish Network of Quality Improvers

Schedule/BACnet Schedule

Instrument Parts I began in Access by selecting the Create ribbon, then the Table Design button.

Withdrawn Equity Offerings: Event Study and Cross-Sectional Regression Analysis Using Eventus Software

MONITORING REPORT ON THE WEBSITE OF THE STATISTICAL SERVICE OF CYPRUS DECEMBER The report is issued by the.

Drawing Courses. Drawing Art. Visual Concept Design. Character Development for Graphic Novels

KENYA 2019 Training Schedule

MONDAY TUESDAY WEDNESDAY THURSDAY FRIDAY SATURDAY SUNDAY

MAP OF OUR REGION. About

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX

Perceptive Enterprise Deployment Suite

EACH MONTH CUTTING EDGE PEER REVIEW RESEARCH ARTICLES ARE PUBLISHED

B.2 Measures of Central Tendency and Dispersion

Final Project. Analyzing Reddit Data to Determine Popularity

MBTA Semester Pass Program User Guide

Your Local Family Bus Company The Bus Garage, Bromyard, HR7 4NT

The Power of Combining Data with the PROC SQL

Standard Operating Procedure. Talent Management. Talent Acquisition. Class 5 Talent Acquisition Process

Year 1 and 2 Mastery of Mathematics

Vector Semantics. Dense Vectors

SOUTH DAKOTA BOARD OF REGENTS. Board Work ******************************************************************************

Enters system mode. Example The following example creates a scheduler named maintenancesched and commits the transaction:

CLOVIS WEST DIRECTIVE STUDIES P.E INFORMATION SHEET

What s next? Are you interested in CompTIA A+ classes?

CSE 305 Programming Languages Spring, 2010 Homework 5 Maximum Points: 24 Due 10:30 AM, Friday, February 26, 2010

Math in Focus Vocabulary. Kindergarten

Table of Contents 1 Basic Configuration Commands 1-1

Employer Portal Guide. BenefitWallet Employer Portal Guide

typedef int Array[10]; String name; Array ages;

Helping You C What You Can Do with SAS

Computer Grade 5. Unit: 1, 2 & 3 Total Periods 38 Lab 10 Months: April and May

AP Statistics Assignments Mr. Kearns José Martí MAST 6-12 Academy

Using SAS/SCL to Create Flexible Programs... A Super-Sized Macro Ellen Michaliszyn, College of American Pathologists, Northfield, IL

Reverse Segmentable GainMaker Node OIB Shorting Condition Technical Bulletin

SOUTH DAKOTA BOARD OF REGENTS PLANNING SESSION AUGUST 8-9, Below is a tentative meeting schedule for the Board meetings in 2013:

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

Quicker Than Merge? Kirby Cossey, Texas State Auditor s Office, Austin, Texas

Data Quality Review for Missing Values and Outliers

Conditional Formatting

Last Name First Name M.I. Individual NPI. Group/Clinic Name as it appears on W9 (if applicable) TIN Taxonomy

Are Your SAS Programs Running You?

Run your reports through that last loop to standardize the presentation attributes

Crystal Reports. Overview. Contents. Cross-Tab Capabilities & Limitations in Crystal Reports (CR) 6.x

Exercises Software Development I. 02 Algorithm Testing & Language Description Manual inspection, test plan, grammar, metasyntax notations (BNF, EBNF)

Why Jan 1, 1960? See:

Guernsey Post 2013/14. Quality of Service Report

Aerospace Software Engineering

INTENT TO FILE (ITF)

Table of Contents 1 Basic Configuration Commands 1-1

DDOS-GUARD Q DDoS Attack Report

COMPUTER TRAINING CENTER

Maryland Corn: Historical Basis and Price Information

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

Chapter 6 Reacting to Player Input

CIMA Certificate BA Interactive Timetable

Genesys Info Mart. date-time Section

LIHP Monthly Aggregate Reporting Instructions Manual. Low Income Health Program Evaluation

NetworX Series. NX-507E RELAY EXPANDER NX-508E OUTPUT EXPANDER Installation and Startup

Useful Tips When Deploying SAS Code in a Production Environment

A SAS Macro to Create Validation Summary of Dataset Report

Going Under the Hood: How Does the Macro Processor Really Work?

NetworX Series NX-1710E Single Door Control. Installation and Startup Manual

Installation Manual GENERAL DESCRIPTION...2 WIRING INFORMATION FOR NX-507 AND NX NX-507 TERMINAL DESCRIPTION...3 NX-507 DRAWING...

Transcription:

%Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables Rich Schiefelbein, PRA International, Lenexa, KS ABSTRACT It is often useful in clinical programming to create additional observations to add to an existing dataset as placeholders. These additional observations then complete the Cartesian product of the combinations of all values for a selected set of variables. For example, the output dataset from the PROC FREQ in a SAS analysis of count data does not include observations for combinations of variables that are not present in the input data, but may still be of interest (i.e. the result dataset does not include an observation with a count value of 0 for Caucasian/Females if that particular combination is not present in the input data). This macro is a very flexible tool which allows one to quickly and easily create additional combination observations in a dataset. Not only will it create observations for all combinations of values for a selected group of existing variables, but entirely new variables, with predetermined values as needed, may also be created in the output dataset. The final resulting dataset will then include the Cartesian product of all values for the set of variables specified as well as any additional placeholders needed. INTRODUCTION There are many instances in clinical trial programming where it is useful to form the cartestian product of the values of a set of variables in a dataset. One instance is in the filling out of an output dataset from PROC FREQ in SAS. The macro described in this paper provides a quick and flexible way to produce the desired result; a dataset containing observations for all combinations of all values of the specified variables. Additional values not present in the input dataset can be user specified prior to forming the Cartesian product. MACRO INPUT DATA The input to this macro can be any dataset containing more than one key variable (the Cartesian product of all values for the key variables will be formed as the output data). Data=Temp.dummy1 Gender Day Month Count Male Monday January 4 Male Monday February 2 Male Tuesday March 1 Male Tuesday May 6 Male Tuesday June 4 Male Wednesday July 3 Male Wednesday July 5 Male Thursday July 1 Male Thursday September 3 Male Saturday September 2 Male Saturday October 2 Male Saturday November 3 Male Saturday November 5 Male Sunday November 4 Female Monday March 8 Female Tuesday March 2 Female Tuesday May 1 Female Friday July 3 Female Friday August 7 Female Sunday August 6 Female Sunday December 5 In this output from Proc Freq, many different combinations of the variables Sex, Day, and Month are not present (for example, the particular combination of Male/Monday/March is not present because no subject in the sample had that particular combination of values). In a summary of this data, it would likely be of interest to explicitly show that for combinations not present in the database, no subjects met the criteria (i.e. show explicitly that zero subjects met the criteria of Male/Monday/March). It may also be of interest to explicitly show that other values (not present in the input database AT ALL) for the key variables of Gender, Day, and Month had no subjects present. For example, the month of April is not present at all in the input dataset, and we may want to explicitly illustrate this in a summary. Though in this case we would likely want to assign a value of zero to the variable count for any newly created observations, any numeric value can be specified using the %addval macro. For example, someone investigating any relationship between the gender of a subject, and the day of week or the month of the year of their birthday may produce an output dataset from PROC FREQ as:

MACRO PARAMETERS INDATA= OUTDATA= BYVAR= SETVAR= SETVAL= ADDVAL X = Specifies the name of the macro input dataset Specifies the name of the macro output dataset -Specifies the key variables in the input dataset -String of variable names each separated by a space -Specifies the non-key variables to be assigned any user-specified numeric value (only newly created observations will be assigned this value) -String of variable names each separated by a space Specifies the numeric value to be assigned to each variable specified in the SETVAR variable (only newly created observations) -Variables which contains any values not present in the input dataset which are to be added to the dataset prior to forming the Cartesian product. -Addval1 should contain the values to be added to the first variable in the BYVAR string, Addval2 should contain the values to be added to the second variable in the BYVAR string, etc. -String of values each separated by a space Note: Only the variables INDATA, OUTDATA, and BYVAR are required. All other variables are optional. SAMPLE MACRO CALL #1 A basic call to complete the Cartesian product of the example dataset temp.dummy1 (above) would appear as: %addval (indata=dummy1, outdata=dummy2, byvar=gender Day Month, setvar=count, setval=0); This call will produce a dataset called dummy2 which will contain all of the original observations in dummy1 plus additional observations representing all possible combinations of the values of the BYVAR variables Gender, Day, and Month. The SETVAR and SETVAL variables specify that for any newly created observations, the value of the variable COUNT should be set to a value of zero. The dataset dummy2 will contain a total of 154 observations (2 levels of Gender X 7 levels of Day X 11 levels of Month). Note that since the month of April does not appear at all in the input dataset, it does not appear at all in the output dataset. SAMPLE MACRO CALL #2 In order to create observations containing values of any of the variables specified in the BYVAR string which are not present at all in the input dataset (such as the month of April), additional variables can be specified in the macro call. %addval (indata=dummy1, outdata=dummy2, byvar=gender Day Month, setvar=count, setval=0, addval3=april); This call will produce a dataset called dummy2 which will contain all of the original observations in dummy1 plus additional observations representing all possible combinations of the values of the BYVAR variables Gender, Day, and Month. Prior to forming the Cartesian product of the BYVAR values, however, any values specified in an addval variable are added to the dataset. Thus, the value of April is added to the variable Month (since addval3 is specified and Month is the third variable in the BYVAR string), and the resulting dataset will contain a total of 168 observations (2 levels of Gender X 7 levels of Day X 12 levels of Month). Again, the SETVAR and SETVAL variables specify that for any newly created observations, the value of the variable COUNT should be set to a value of zero. CONCLUSION The macro presented in this paper is an effective tool in allowing clinical data programmers work more efficiently. The type of situation illustrated is a common one in preparing summary tables in the pharmaceutical industry. The macro can be used in any situation where it is useful to add additional observations to an existing dataset from combinations of existing dataset variable values and user specified values. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Rich Schiefelbein PRA International 16400 College Blvd. Lenexa, KS 66219 Work Phone: 913-577-2782 Fax: 913-599-0344 Email: Schiefelbeinrich@praintl.com Web: TRADEMARK INFORMATION SAS is a registered trademark of the SAS Institute, Inc. in the USA and other countries.

APPENDIX (SOURCE CODE) %macro addval(indata=,outdata=,byvar=,setvar=,setval=, addval1=, addval2=, addval3=, addval4=, addval5=, addval6=, addval7=, addval8=, addval9=, addval10=, addval11=,addval12=,addval13=,addval14=,addval15=, addval16=,addval17=,addval18=,addval19=,addval20=); %let miss=; /* Create a macro variable containing the number of variables present in the 'byvar' string */ data info1; length bystring $200; bystring="&byvar"; /* value of 'x' in the resulting dataset 'info2' contains the number of words in string */ data info2(keep=x bystring); length y $200; set info1; x=1; do until(y=' '); y=scan(bystring,x,' '); x=x+1; x=x-2; set info2; call symput('numbyvar',x); /* Define macro variable references for each byvar */ /* One macro variable string is defined for each variable in the 'byvar' string */ data bydata(keep=var2 var4); length var1 var3 var4 $200; do i=1 to &numbyvar; var1='byvar'; var2=i; var3=put(var2,3.0); var4=left(trim(var1)) left(trim(var3)); output; set bydata; call symput(var4,' '); /* output null values initially */ length byvarc byvarc1 newvar $200; set bydata; byvarc="&byvar"; byvarc1=left(trim(byvarc)); newvar=scan(byvarc1,var2,' '); call symput(var4,newvar); /* output macro strings for byvars */ /************************************************************************/ /* The following section of code will define the text */ /* strings for the variables to be 'added' to the byvars */ /* */ /* The form of the string is an output statement. For example: */ /* */ /* Day='Tuesday'; output; Month='April'; output; */ /* */ /* These strings are used to create a dataset containing the */ /* values for all byvars which need to be added to the original */ /* dataset. This dataset is then set into the original dataset */ /* prior to forming the 'Cartesian product'. */ /*************************************************************************/ data passadd1(keep=var3 var5 var6); length var1 var2 var4 var5 var6 $200; do i=1 to &numbyvar; var1='addval'; var2='byvar'; var3=i; var4=put(var3,3.0); var5=left(trim(var1)) left(trim(var4)); var6=left(trim(var2)) left(trim(var4)); output; data passadd2; length newvar1 newvar2 $200; set passadd1; newvar1=symget(var5); newvar2=symget(var6); /* value of 'x' in the resulting dataset 'passadd3' contains the number of words in string */ data passadd3; length y $200; set passadd2; x=1; do until(y=' '); y=scan(newvar1,x,' '); x=x+1; x=x-2;

proc printto name=work.temp.addvalx.output new; length wordval string1 $200; set passadd3; file print notitle; wordval=' '; string1=' '; do i=1 to x; wordval=scan(newvar1,i,' '); put newvar2"='"wordval"'; output;"; /* output text strings to text file named 'addvalx' */ proc printto; /* Add the user specified values to the byvars ('key' variables) */ data addval(keep=&byvar); set &indata; if _n_=1; /* use first obs to avoid creating missing values */ filename addvalx1 catalog 'work.temp.addvalx.output'; data addvaln; set addval end=eof; if eof then %include addvalx1; data newdata; set &indata addvaln; /* set new obs with the original dataset */ /* Build variable text strings to be used as sql statement */ /* -SQL statement used to create the Cartesian product for all 'key' variables */ /* -One string is created (and 'put' to a text file). This string contains the entire SQL statement */ proc printto name=work.temp.sqlx.output new; length newvar newvar1 save1 $200; file print notitle; newvar="&byvar"; newvar1=left(trim(newvar)); save1=newvar1; put 'proc sql'; put %str(';'); put 'create table all as select distinct'; mark=2; w=0; do until(mark = 0); w=w+1; /* created dummy dataset name references for each byvar */ if w=1 then letter = 'a'; if w=2 then letter = 'b'; if w=3 then letter = 'c'; if w=4 then letter = 'd'; if w=5 then letter = 'e'; if w=6 then letter = 'f'; if w=7 then letter = 'g'; if w=8 then letter = 'h'; if w=9 then letter = 'i'; if w=10 then letter = 'j'; if w=11 then letter = 'k'; if w=12 then letter = 'l'; if w=13 then letter = 'm'; if w=14 then letter = 'n'; if w=15 then letter = 'o'; if w=16 then letter = 'p'; if w=17 then letter = 'q'; if w=18 then letter = 'r'; if w=19 then letter = 's'; if w=20 then letter = 't'; mark=index(left(trim(newvar1)),' '); subvar1=substr(newvar1,1,mark-1); put letter'.'subvar1','; /* define the 'select' SQL statement */ if mark = 0 then do; put letter'.'newvar1; newvar1=substr(newvar1,mark+1); mark=2; w=0; put 'from'; newvar1=save1; do until(mark = 0); w=w+1; /* created dummy dataset name references for each byvar */ if w=1 then letter = 'a'; if w=2 then letter = 'b'; if w=3 then letter = 'c'; if w=4 then letter = 'd'; if w=5 then letter = 'e'; if w=6 then letter = 'f'; if w=7 then letter = 'g'; if w=8 then letter = 'h'; if w=9 then letter = 'i'; if w=10 then letter = 'j'; if w=11 then letter = 'k'; if w=12 then letter = 'l'; if w=13 then letter = 'm'; if w=14 then letter = 'n'; if w=15 then letter = 'o'; if w=16 then letter = 'p'; if w=17 then letter = 'q'; if w=18 then letter = 'r'; if w=19 then letter = 's'; if w=20 then letter = 't'; mark=index(left(trim(newvar1)),' '); subvar2=substr(left(trim(newvar1)),1,mark-1); put 'newdata(keep='subvar2') as 'letter','; if mark = 0 then do; /* define the 'from' SQL statement */ put 'newdata(keep='newvar1') as 'letter; newvar1=substr(newvar1,mark+1); put ';quit;';

/* Build strings for the 'set value' statements (stored in macro variable 'setvar') */ /* -All variables specified in the 'setvar' string will be set to the user specified numeric value ('setval') */ /* -This is only done for 'newly created' observations */ %if &setvar ne &miss %then %do; length build1 $200; newvarz="&setvar"; newvar1z=left(trim(newvarz)); build1=''; mark=2; do until(mark = 0); mark=index(left(trim(newvar1z)),' '); build1=left(trim(build1)) substr(left(trim(newvar1z)),1,mark- 1) "=&setval;"; if mark = 0 then do; build1=left(trim(build1)) trim(left(newvar1z)) "=&setval;"; newvar1z=substr(newvar1z,mark+1); call symput('setvar',build1); % proc printto; /* Form the Cartesian product (for 'key' variables) of the final dataset. */ /* The 'final' dataset contains all observation from the original dataset AND all 'newly created' observations. */ filename cartprod catalog 'work.temp.sqlx.output'; %include cartprod; /* -This is the complete sql statement used to form the Cartesian product */ /* - The dataset produced is named 'all' */ /* Set all user specified 'non-key' variables to the user specified numeric value */ data &outdata; merge &indata(in=a) all(in=b); by &byvar; if not a then do; %if &setvar ne &miss %then %do; &setvar % %mend addval;