The Impossible An Organized Statistical Programmer Brian Spruell and Kevin Mcgowan, SRA Inc., Durham, NC

Similar documents
PREREQUISITES FOR EXAMPLES

Data Manipulations Using Arrays and DO Loops Patricia Hall and Jennifer Waller, Medical College of Georgia, Augusta, GA

Leveraging SAS Visualization Technologies to Increase the Global Competency of the U.S. Workforce

An Introduction to Analysis (and Repository) Databases (ARDs)

Making a List, Checking it Twice (Part 1): Techniques for Specifying and Validating Analysis Datasets

LST in Comparison Sanket Kale, Parexel International Inc., Durham, NC Sajin Johnny, Parexel International Inc., Durham, NC

One-PROC-Away: The Essence of an Analysis Database Russell W. Helms, Ph.D. Rho, Inc.

So, Your Data are in Excel! Ed Heaton, Westat

Automation of makefile For Use in Clinical Development Nalin Tikoo, BioMarin Pharmaceutical Inc., Novato, CA

Working with Composite Endpoints: Constructing Analysis Data Pushpa Saranadasa, Merck & Co., Inc., Upper Gwynedd, PA

PharmaSUG Paper TT11

CDISC Variable Mapping and Control Terminology Implementation Made Easy

Document and Enhance Your SAS Code, Data Sets, and Catalogs with SAS Functions, Macros, and SAS Metadata. Louise S. Hadden. Abt Associates Inc.

Interactive Programming Using Task in SAS Studio

Making the most of SAS Jobs in LSAF

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

Utilizing the VNAME SAS function in restructuring data files

Automating Comparison of Multiple Datasets Sandeep Kottam, Remx IT, King of Prussia, PA

Comparison of different ways using table lookups on huge tables

Biostatistics & SAS programming. Kevin Zhang

PhUSE Giuseppe Di Monaco, UCB BioSciences GmbH, Monheim, Germany

Useful Tips When Deploying SAS Code in a Production Environment

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

Automating the Production of Formatted Item Frequencies using Survey Metadata

Reverse-engineer a Reference Curve: Capturing Tabular Data from Graphical Output Brian Fairfield-Carter, ICON Clinical Research, Redwood City, CA

One Project, Two Teams: The Unblind Leading the Blind

Combining TLFs into a Single File Deliverable William Coar, Axio Research, Seattle, WA

Quality Control of Clinical Data Listings with Proc Compare

Easy CSR In-Text Table Automation, Oh My

What to Expect When You Need to Make a Data Delivery... Helpful Tips and Techniques

Automation of SDTM Programming in Oncology Disease Response Domain Yiwen Wang, Yu Cheng, Ju Chen Eli Lilly and Company, China

A Macro to Keep Titles and Footnotes in One Place

PharmaSUG Paper AD09

ABSTRACT INTRODUCTION WORK FLOW AND PROGRAM SETUP

Utilizing the Stored Compiled Macro Facility in a Multi-user Clinical Trial Setting

Taming a Spreadsheet Importation Monster

Give me EVERYTHING! A macro to combine the CONTENTS procedure output and formats. Lynn Mullins, PPD, Cincinnati, Ohio

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

Paper A Simplified and Efficient Way to Map Variable Attributes of a Clinical Data Warehouse

ODS TAGSETS - a Powerful Reporting Method

Using UNIX Shell Scripting to Enhance Your SAS Programming Experience

Once the data warehouse is assembled, its customers will likely

SAS ENTERPRISE GUIDE USER INTERFACE

Write SAS Code to Generate Another SAS Program A Dynamic Way to Get Your Data into SAS

Paper HOW-06. Tricia Aanderud, And Data Inc, Raleigh, NC

SESUG 2014 IT-82 SAS-Enterprise Guide for Institutional Research and Other Data Scientists Claudia W. McCann, East Carolina University.

Reading in Data Directly from Microsoft Word Questionnaire Forms

PharmaSUG Paper PO22

PharmaSUG Paper PO10

Prove QC Quality Create SAS Datasets from RTF Files Honghua Chen, OCKHAM, Cary, NC

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

A SAS and Java Application for Reporting Clinical Trial Data. Kevin Kane MSc Infoworks (Data Handling) Limited

Real Time Clinical Trial Oversight with SAS

SAS (Statistical Analysis Software/System)

PharmaSUG2014 Paper DS09

One-Step Change from Baseline Calculations

A Mass Symphony: Directing the Program Logs, Lists, and Outputs

SAS Clinical Data Integration 2.4

SAS Data Integration Studio 3.3. User s Guide

Cleaning up your SAS log: Note Messages

Tales from the Help Desk 6: Solutions to Common SAS Tasks

How to Implement the One-Time Methodology Mark Tabladillo, Ph.D., MarkTab Consulting, Atlanta, GA Associate Faculty, University of Phoenix

Parallelizing Windows Operating System Services Job Flows

Paper William E Benjamin Jr, Owl Computer Consultancy, LLC

The Benefits of Traceability Beyond Just From SDTM to ADaM in CDISC Standards Maggie Ci Jiang, Teva Pharmaceuticals, Great Valley, PA

Automate Clinical Trial Data Issue Checking and Tracking

Create Metadata Documentation using ExcelXP

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

Doctor's Prescription to Re-engineer Process of Pinnacle 21 Community Version Friendly ADaM Development

SAS Drug Development Program Portability

Effectively Utilizing Loops and Arrays in the DATA Step

How to Implement the One-Time Methodology Mark Tabladillo, Ph.D., Atlanta, GA

WHAT IS THE CONFIGURATION TROUBLESHOOTER?

Application Interface for executing a batch of SAS Programs and Checking Logs Sneha Sarmukadam, inventiv Health Clinical, Pune, India

Procedure for Stamping Source File Information on SAS Output Elizabeth Molloy & Breda O'Connor, ICON Clinical Research

TLFs: Replaying Rather than Appending William Coar, Axio Research, Seattle, WA

Integrated Safety Reporting Anemone Thalmann elba - GEIGY Ltd (PH3.25), Basel

Using Big Data to Visualize People Movement Using SAS Basics

Data Integrity through DEFINE.PDF and DEFINE.XML

Figure 1. Table shell

PharmaSUG Paper PO12

Power Data Explorer (PDE) - Data Exploration in an All-In-One Dynamic Report Using SAS & EXCEL

Hash Objects for Everyone

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

Administering SAS Enterprise Guide 4.2

Producing Summary Tables in SAS Enterprise Guide

Run your reports through that last loop to standardize the presentation attributes

Creating a Patient Profile using CDISC SDTM Marc Desgrousilliers, Clinovo, Sunnyvale, CA Romain Miralles, Clinovo, Sunnyvale, CA

The development of standards management using EntimICE-AZ

Enterprise Client Software for the Windows Platform

A Methodology for Truly Dynamic Prompting in SAS Stored Processes

Divide and Conquer Writing Parallel SAS Code to Speed Up Your SAS Program

A Generalized Macro-Based Data Reporting System to Produce Both HTML and Text Files

Organizing Deliverables for Clinical Trials The Concept of Analyses and its Implementation in EXACT

Creating a Departmental Standard SAS Enterprise Guide Template

TLF Management Tools: SAS programs to help in managing large number of TLFs. Eduard Joseph Siquioco, PPD, Manila, Philippines

The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data

Customizing SAS Data Integration Studio to Generate CDISC Compliant SDTM 3.1 Domains

SAS Clinical Data Integration 2.6

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

Transcription:

Paper CS-061 The Impossible An Organized Statistical Programmer Brian Spruell and Kevin Mcgowan, SRA Inc., Durham, NC ABSTRACT Organization is the key to every project. It provides a detailed history of what the client wanted and shows the means by which the programmer met client goals. It also allows for reproducibility. Finally, and perhaps most importantly, it reduces headache. This paper will demonstrate tools and techniques developed at Constella which were used for a large clinical trials project. Programs were standardized so that all changes would have to be made in one place. A master meta data file was developed and maintained throughout the project. The meta file itself was an excel spreadsheet. It contained columns of information which called certain programs, added titles, would subset the data based on certain variables, etc. This cut down on human error and significantly reduced the programming time. Keywords: organization, macro, meta file, header INTRODUCTION Recently SRA was working with a pharmaceutical company which was in the process of developing a new drug. For obvious reasons I can not mention the drug or the company. This particular company had a meeting scheduled with the Food and Drug Administration (FDA). Our role was to generate various outputs that the client could present to the FDA in order to show the progress they were making on the drug in question. This required SRA to generate hundreds of graphs, tables and listings. Fortunately many of those graphs and tables were very similar in structure/layouts. Differences came from the actual data, titles, footnotes, labels, etc. Rather than make a 100 different programs to generate each of theses outputs, SRA decided to standardize a few programs (six) which were used to generate all the outputs needed. The six programs were written in such a way that they would not require modifications. It was decided that all modifications would be made in one central place. A master meta file was devised in excel which would serve this purpose. In this paper I also hope to show a standard header used in SRA SAS programs as well as some programming techniques involving macros and do loops which should make programs easier to maintain. 1

THE OUTPUT The following example will demonstrate the similarities and differences between the various outputs requested by the client. Both are very different endpoints but the graph outputs have similar components: 2

What changes are the y-label and the points on the x-axis. The actual data that is feed into the program changes as well. TitlesS (not displayed here) are also different. These same similarities and differences exist for the tables and listings as well. We had literally hundreds of graphs and tables to produce. So it was thought best to standardize the process both for ease of making changes and reproducibility. The same SAS program generated these two outputs. THE META FILE The meta file which contained the information needed to generate all the output was built in excel. For the clinical drugs project thirty eight columns were populated. These columns varied in use. They detailed the name of the output file, told SAS which macro program to load, assigned footnotes and labels to graphs, indicated which person was assigned to qc the output, etc. Below is an example of the first four columns: PGM NAME TFL TFL Title PGM Type F_PIC_MEAN Figure 14.2.3.1 Composite PIC (Mean) Over Time Figure The program name is really the name of the file generated. It is assigned an appropriate number (TLF) and title which will appear in the output file. The program type indicates that this file is a figure. 3

Class foot1 foot2 Input datasets PIC Composite PIC is the sum of TNFalpha and IL-6 measurements. 1,2,4,5,9 Labs These next four columns indicate the class designation (what type of data was being represented) along with footnotes and the input dataset. The dataset column tells the program which dataset to use. The foot2 indicates which set of standard footnotes to use (i.e. company name, label designations, etc.). Val Select1 Select2 Orientation Macro Used US_RES tst_name='composite PIC' Landscape BAR The value column indicates which variable from the labs dataset will be used to populate the figure. The Select1 column indicates which tests (in this instance the pic tests) will be displayed in the figure. The Macro used is Bar which tells the SAS program to generate a bar graph for these end points. The remaining columns are administrative (programmer, etc.). You can see how helpful this is when you have a very standardized process. Everything that changes takes place in the meta file and not in the actual program. The programs are static so you never have to worry about not being able to reproduce an output. Filters were added to the excel spreadsheet to make the navigation of the file easier. The next section will demonstrate code written which reads in the above excel file. THE CODE The code which reads into SAS the excel file is below: PROC IMPORT OUT= WORK.metadata(where =(pgm_name^="")) DATAFILE= "M:\Clinical\Projects\\Programming\Tools\ meta_data_update.xls" DBMS=EXCEL REPLACE; SHEET="'newmeta$'"; GETNAMES=YES; MIXED=NO; SCANTEXT=YES; USEDATE=YES; SCANTIME=YES; RUN; We used the proc import procedure to input the excel file. We called the resulting SAS dataset metadata. We then specified the location of the meta file on the network. We then indicated which sheet we wanted, newmeta, to read in. This allowed us the flexibility of using the spreadsheet s other sheets for something else. 4

The resulting permanent SAS dataset is saved on the network in the project folder. It is then called by another SAS program which parses all the necessary information to produce output. The next section will give an example of one of the macros used to generate the figures. MACRO BAR In our example above we were generating a bar graph for the PIC data. Let us now explore the bar macro program and see how it works. %macro BAR(PGM_Name, Input_Datasets, Val, Select1, YLabel, Precision, Title0, Title1, Title2, Title3, Foot1, Foot2, Foot3, Orientation); data tmp; keep Cohort usubjid Visit_TPT_Key Value; set adata.&input_datasets.; %if (%nrbquote(&select1)^=%str()) %then %do; Where %unquote(&select1); %end; Value = &Val.; run;. The above macro reads in several of the values which were originally put in the excel file. This allows us to keep the macro programs static. Inputs and outputs may change but the mechanism which turns input into output remains the same. THE ALL IMPORTANT HEADER There were instances where things, due to client requests, necessitated changing the actual macro programs. A mechanism was devised which enabled us to keep track of those changes and provided some accountability with regard to who made those changes. That mechanism was the all important SAS program header. The following is a standard header used in our division of SRA Inc. It contains all the necessary background information at a convenient location within a SAS program: at the beginning! / File Name: Description: External References: Type Name Description of Reference ---------- ----------------- ---------------------------------------- INPUT OUTPUT Modification History: Num Date Programmer Description of Modification --- -------- ---------- ------------------------------------------------ 001 mm/dd/yy Name CREATED / 5

This is a useful template regardless of what project it is used for. In an easy to find location a new programmer can determine what the program s purpose is. He/she can also determine what types of inputs it requires and what the program outputs. The most important part of this header is the Modification History portion. In this section we can determine what changes have been made, when they were made and who made them. This is very helpful when you have to re-visit the program after a large amount of time has elapsed. It also provides much needed accountability. You can ask the programmer who made a modification why he or she made that particular change. / File Name: Sesug2008.sas Description: Program written to demonstrate usefulness of SAS program header External References: Developed for the 2008 South Eastern SAS Users Group Type Name Description of Reference ---------- ----------------- ---------------------------------------- INPUT meta_file.xls 1) Excel file OUTPUT metafile 1) sas dataset Modification History: Num Date Programmer Description of Modification --- -------- ---------- ------------------------------------------------ 001 04/26/08 Brian Spruell CREATED 002 05/11/08 Brian Spruell Modified library name to reflect new network location 003 06/07/08 Kevin Mcgowan Added new columns to import program. See new portion labeled Mcgowan updates / As the above example shows it is now very easy to determine the purpose of this program. We learn that the code that follows the header portion was written for the 2008 SESUG conference. We also learn that this particular program requires an excel file named meta_file and the resulting output is a sas dataset named metafile. More importantly the modification history portion provides us with some accountability. We know who created the program and then we see changes made to this program and who made those changes. MACRO AND DO LOOPS I recently worked on a project which required generating a large SAS dataset from several smaller SAS datasets. The datasets came from various NHANES (National Health and Nutrition Evaluation Survey) study years (1999-2000, 2001-2002, 2003-2004). The number of datasets needed varied based on which survey year was used. Also each year s datasets existed in separate subfolders. I tried my best to keep the code short and clean by using do loops and macros. I first created a macro variable named Basedir which would contain the project base directory: %let basedir=m:\chr\statistics\projects\niehs-ntp_task1\work\human Studies\xenobiotic; Second I assigned a libname statement to where I wanted to output the merged dataset: 6

libname outdata "&basedir.\sas"; Next I built a macro which would read in the datasets from various locations and merge them together and finally output them to outdata: %macro buildds; / SURVEY YEARS / %let years = %str(1999-2000 2001-2002 2003-2004); %let datasets = %str(lab28poc lab06 lab26pp sspfc_a, l28poc_b l06hm_b l26pp_b, l28dfp_c l06bmt_c l28ocp_c l24pfc_c); %let loops = %str(4 3 4); %let names = %str(dioxin mercury pest perflor); %local i j; First I created several macro variables which would contain the survey years, dataset names, output dataset names and even the number of times I wanted my macro to loop based on the number of datasets within a survey (the survey 2001-2002 did not analyze certain forms of compounds which were analyzed in the other two years). I then created my outer loop i which would loop through each of the surveys: %do i = 1 %to 3; %let year = %scan(&years., &i, ' '); %let loop = %scan(&loops., &i, ' '); %let sets = %scan(&datasets., &i, ','); libname nhanes&i. "&basedir.\data\&year."; I used the loop to assign macro variables year, loop and sets all of which are based on the plural macro variables. For instance in the first iteration year was assigned a value of 1999-2000, loop had a value of 4 and sets has a value of lab28poc lab06 lab26pp sspfc_a ). I then had my macro perform another loop called j (j would have as many iterations as defined by the macro variable loop ): %do j = 1 %to &loop; %let dataset = %scan(&sets., &j, ' '); %let name = %scan(&names., &j, ' '); libname XP xport "&basedir.\data\&year.\&dataset..xpt"; proc copy in=xp out=nhanes&i.; run; data &name.&i.; length nhanesyr $9; set nhanes&i..&dataset.; nhanesyr="&year."; run; data Xenobiotic&i; 7

%end; %if (&j = 1) %then %do; set &name.&i; %end; %else %do; merge xenobiotic&i &name.&i.; by SEQN; %end; run; For the first inner iteration dataset would take on a value of lab28poc and name would be dioxin. Dataset dioxin1 would be created by setting the dataset lab28poc created by the proc copy statement from the 1999-2000 survey folder. For each survey year the dataset Xenobiotic would be created with the outer iteration number as an identifier. The first inner loop dataset would simply be set into this dataset. The subsequent inner loops would be merged by the participant id (SEQN). This Xenobiotic# would then be set into Xenobiotic which would be outputted to the network for further analysis. %end; data Xenobiotic; %if (&i=1) %then set; %else set xenobiotic; xenobiotic&i; run; CONCLUSION In the beginning the time and effort it takes to ensure that we are an organized programmer may seem substantial. However, in terms of overall costs and reproducibility the long term costs (and potential headaches) are reduced significantly. Hopefully the header and the meta excel file idea will prove helpful in your future SAS programming endeavors. ACKNOWLEDGMENTS Deepak Mav senior statistician and primary programmer on the meta program. Renee Jaramillo statistical programmer on assisted with the meta program and bar macro CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Brian Spruell SRA International 2605 Meridian Parkway Suite 200 Durham, NC 27713 Work Phone: (919) 313 7673 E-mail: brian_spruell@sra.com Web: www.sra.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 8