A Macro to Create Program Inventory for Analysis Data Reviewer s Guide Xianhua (Allen) Zeng, PAREXEL International, Shanghai, China

Similar documents
Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

A Tool to Compare Different Data Transfers Jun Wang, FMD K&L, Inc., Nanjing, China

Journey to the center of the earth Deep understanding of SAS language processing mechanism Di Chen, SAS Beijing R&D, Beijing, China

When Powerful SAS Meets PowerShell TM

A Macro that can Search and Replace String in your SAS Programs

Planning to Pool SDTM by Creating and Maintaining a Sponsor-Specific Controlled Terminology Database

SAS Drug Development Program Portability

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

Reading in Data Directly from Microsoft Word Questionnaire Forms

Using an ICPSR set-up file to create a SAS dataset

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

Matt Downs and Heidi Christ-Schmidt Statistics Collaborative, Inc., Washington, D.C.

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

Let SAS Help You Easily Find and Access Your Folders and Files

Keeping Track of Database Changes During Database Lock

Electricity Forecasting Full Circle

PharmaSUG Paper PO12

Prove QC Quality Create SAS Datasets from RTF Files Honghua Chen, OCKHAM, Cary, NC

PDF Multi-Level Bookmarks via SAS

Macro Quoting: Which Function Should We Use? Pengfei Guo, MSD R&D (China) Co., Ltd., Shanghai, China

The Output Bundle: A Solution for a Fully Documented Program Run

Increase Defensiveness of Your Code: Regular Expressions

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

Submitting SAS Code On The Side

SAS automation techniques specification driven programming for Lab CTC grade derivation and more

Data Quality Review for Missing Values and Outliers

How to use UNIX commands in SAS code to read SAS logs

Create Metadata Documentation using ExcelXP

ODS/RTF Pagination Revisit

PharmaSUG Paper TT11

Run your reports through that last loop to standardize the presentation attributes

PharmaSUG Paper AD03

Automation of makefile For Use in Clinical Development Nalin Tikoo, BioMarin Pharmaceutical Inc., Novato, CA

How to Keep Multiple Formats in One Variable after Transpose Mindy Wang

Simplifying Your %DO Loop with CALL EXECUTE Arthur Li, City of Hope National Medical Center, Duarte, CA

%check_codelist: A SAS macro to check SDTM domains against controlled terminology

Your Own SAS Macros Are as Powerful as You Are Ingenious

PharmaSUG Paper AD06

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

Macro Method to use Google Maps and SAS to Geocode a Location by Name or Address

Program Validation: Logging the Log

The Dataset Diet How to transform short and fat into long and thin

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

ABSTRACT INTRODUCTION TRICK 1: CHOOSE THE BEST METHOD TO CREATE MACRO VARIABLES

SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD

BreakOnWord: A Macro for Partitioning Long Text Strings at Natural Breaks Richard Addy, Rho, Chapel Hill, NC Charity Quick, Rho, Chapel Hill, NC

Automating Comparison of Multiple Datasets Sandeep Kottam, Remx IT, King of Prussia, PA

PhUSE US Connect 2018 Paper CT06 A Macro Tool to Find and/or Split Variable Text String Greater Than 200 Characters for Regulatory Submission Datasets

Cover the Basics, Tool for structuring data checking with SAS Ole Zester, Novo Nordisk, Denmark

Logging the Log Magic: Pulling the Rabbit out of the Hat

SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada

Implementing external file processing with no record delimiter via a metadata-driven approach

Streamline Table Lookup by Embedding HASH in FCMP Qing Liu, Eli Lilly & Company, Shanghai, China

PharmaSUG China Paper 059

One Project, Two Teams: The Unblind Leading the Blind

Essential ODS Techniques for Creating Reports in PDF Patrick Thornton, SRI International, Menlo Park, CA

Clinical Data Visualization using TIBCO Spotfire and SAS

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

How to validate clinical data more efficiently with SAS Clinical Standards Toolkit

SAS Institute Exam A SAS Advanced Programming Version: 6.0 [ Total Questions: 184 ]

SAS Macro Dynamics: from Simple Basics to Powerful Invocations Rick Andrews, Office of Research, Development, and Information, Baltimore, MD

A Practical Introduction to SAS Data Integration Studio

ABSTRACT INTRODUCTION MACRO. Paper RF

Untangling and Reformatting NT PerfMon Data to Load a UNIX SAS Database With a Software-Intelligent Data-Adaptive Application

Virtual Accessing of a SAS Data Set Using OPEN, FETCH, and CLOSE Functions with %SYSFUNC and %DO Loops

A Macro to Keep Titles and Footnotes in One Place

Avoiding Macros in SAS. Fareeza Khurshed Yiye Zeng

Using SAS to Control the Post Processing of Microsoft Documents Nat Wooding, J. Sargeant Reynolds Community College, Richmond, VA

Importing CSV Data to All Character Variables Arthur L. Carpenter California Occidental Consultants, Anchorage, AK

Hidden in plain sight: my top ten underpublicized enhancements in SAS Versions 9.2 and 9.3

A Macro To Generate a Study Report Hany Aboutaleb, Biogen Idec, Cambridge, MA

LST in Comparison Sanket Kale, Parexel International Inc., Durham, NC Sajin Johnny, Parexel International Inc., Durham, NC

Using UNIX Shell Scripting to Enhance Your SAS Programming Experience

TLF Management Tools: SAS programs to help in managing large number of TLFs. Eduard Joseph Siquioco, PPD, Manila, Philippines

PharmaSUG Paper SP04

Exporting Variable Labels as Column Headers in Excel using SAS Chaitanya Chowdagam, MaxisIT Inc., Metuchen, NJ

Macros for Two-Sample Hypothesis Tests Jinson J. Erinjeri, D.K. Shifflet and Associates Ltd., McLean, VA

A SAS Macro to Create Validation Summary of Dataset Report

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System

Automate Clinical Trial Data Issue Checking and Tracking

PharmaSUG China Paper 70

What Do You Mean My CSV Doesn t Match My SAS Dataset?

Cleaning up your SAS log: Note Messages

Using SAS to Control and Automate a Multi SAS Program Process Patrick Halpin, dunnhumby USA, Cincinnati, OH

Going Under the Hood: How Does the Macro Processor Really Work?

Missing Pages Report. David Gray, PPD, Austin, TX Zhuo Chen, PPD, Austin, TX

CDISC Variable Mapping and Control Terminology Implementation Made Easy

How a Code-Checking Algorithm Can Prevent Errors

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

29 Shades of Missing

PharmaSUG China Mina Chen, Roche (China) Holding Ltd.

ET01. LIBNAME libref <engine-name> <physical-file-name> <libname-options>; <SAS Code> LIBNAME libref CLEAR;

Developing Data-Driven SAS Programs Using Proc Contents

The Demystification of a Great Deal of Files

It s not the Yellow Brick Road but the SAS PC FILES SERVER will take you Down the LIBNAME PATH= to Using the 64-Bit Excel Workbooks.

Using SAS/SCL to Create Flexible Programs... A Super-Sized Macro Ellen Michaliszyn, College of American Pathologists, Northfield, IL

Transcription:

PharmaSUG 2018 - Paper QT-08 A Macro to Create Program Inventory for Analysis Data Reviewer s Guide Xianhua (Allen) Zeng, PAREXEL International, Shanghai, China ABSTRACT As per Analysis Data Reviewer s Guide (ADRG) released by the PhUSE CSS Working Group in January, 2015, a program inventory that contains all inputs (such as SDTM domains or other analysis datasets) and macros used should be listed in the section Submission of Programs. It can be laborious and time consuming to extract inputs and macros manually. This paper presents a SAS macro ExtDsnMan, which programmatically extracts all input datasets and macros and produces an XLS report. The code is currently hosted in my Github page: https://raw.githubusercontent.com/xianhuazeng/pharmasug/master/2018/extdsnman.sas INTRODUCTION When writing ADRG, we need to create the program inventory that contains all inputs (such as SDTM domains or other analysis datasets) and macros used in section Submission of Programs. To do this task manually might be cumbersome, so it seemed appropriate to create a macro to handle the work. Note that the code for this paper was written in UNIX operating environment. Therefore, users would need to make some adjustments to the code based on their operating environment. MACRO PARAMETERS I call this macro as %ExtDsnMan and it has 4 input parameters as listed in Table 1 below: Parameter Description P_PATH Path to the parent directory where all SAS programs are located. This is a REQUIRED parameter and does not have a default value. KWORD Library name of analysis or SDTM datasets. This is a REQUIRED parameter and its default value is analysis raw, with library names separated by a single pipe symbol. M_PATH Path to the directory where macros are located. This is a REQUIRED parameter and does not have a default value. O_PATH Path to the output directory where program inventory is located. This is not a REQUIRED parameter and does not have a default value. Table 1. ExtDsnMan Parameters METHODOLOGY ExtDsnMan initially reads all programs into a dataset and then extracts input datasets and macros used using SAS Perl regular expression functions. Approaches to programmatically extract input datasets and macros used are demonstrated as follows: READ ALL PROGRAMS INTO ONE DATASET Read all programs in the program parent directory including all subdirectories to a dataset using FOPEN/FREAD/FGET functions. This method is more efficient than DO loop since the loop goes through all programs, one by one. The code used to do this task is shown below: RC_FILE=filename('code', PATH,,'LRECL=32767'); FID=fopen('code'); do while (fread(fid)=0); length CODELINE $32767; RC_READ=fget(FID, CODELINE, 32767); 1

CLOSE=fclose(FID); RC_FILE=filename('code'); IDENTIFY INPUTS AND MACROS CALL As the rules for legal SAS names, a name can be 1 to 32 characters long, begin with a letter or underscore, and the rest of the characters can contain only letters, numerals, or underscores. SAS Perl regular expression function is an ideal method. A pattern like library.dataset in program is considered as input: RE=prxparse("/\b(&kword)\.(?=\w{1,32})([a-zA-Z_][a-zA-Z0-9_]*)/"); A pattern like %maroname in program is considered as macro call: RE=prxparse('/%(?=\w{1,32})[a-zA-Z_][a-zA-Z0-9_]*\(?/'); Note that a pattern like library.dataset or %maroname in a comment should not be identified as input or macro call. The following is regular expression to remove comments: RE=prxparse('/^(\* %\* \/\*).+$/'); EXTRACT INPUTS AND MACROS CALL In this section, the ultimate goal is to parse the code lines to extract the inputs and macros used. The regular expression used to extract the inputs is shown below: RE=prxparse("s/.*?(?:(?:&kword)\.(\w+))?/\1 /i"); Regular expression visualization by Regexper: Figure 1. Regular Expression Visualization The following piece of codes extracts the macros: /*Create macros list within macro library */ proc sql noprint; quit; select FNAME into :mlist separated by ' ' from mlib ; RE=prxparse("s/.*?(?:%(&mlist)\b)?/\1 /i"); Here is a brief explanation of the regular expression above.. matches any single character except newline. *? is lazy repetition factor, matches 0 or more occurrences of the preceding character as few times as possible. (?: ) means non-capturing group. % matches percent sign. \b matches a word boundary. The second ( and ) characters matches a pattern and creates a capture buffer for the match. The last? is greedy repetition factor, matches the first capturing group zero or one time as many times as possible. The \1 matches capture buffer 1. MACRO SOURCE CODE %macro extdsnman(p_path =, 2

m_path =, kword =analysis raw, o_path = ); /*To mute "WARNING: The quoted string currently being processed has become more than 262 characters long. You may have unbalanced quotation marks."*/ options NOQUOTELENMAX; /*List all files within a directory including subdirectories*/ filename dir pipe "find &p_path -name '*.sas'"; data code; infile dir truncover lrecl=1024; length PATH $1024 FNAME $200; input PATH 1-1024; retain RE; if _N_=1 then RE=prxparse('s/(.+)\/(.+?)/\2/'); FNAME=prxchange(RE, 1, PATH); /*Read all files within in a directory including subdirectories into a dataset*/ RC_FILE=filename('code', PATH,,'LRECL=32767'); FID=fopen('code'); do while (fread(fid)=0); length CODELINE $32767; RC_READ=fget(FID, CODELINE, 32767); CODELINE=compress(CODELINE,, 'kw'); if ^missing(compress(codeline)) then output; CLOSE=fclose(FID); RC_FILE=filename('code'); keep PATH CODELINE FNAME; /*Close the pipe*/ filename dir clear; %macro ftype(type=); data temp01; set code; retain RE1 RE2; if _N_=1 then do; RE1=%if &type=input %then prxparse("/\b(&kword)\.(?=\w{1,32})([a-za- Z_][a-zA-Z0-9_]*)/"); %else prxparse('/%(?=\w{1,32})[a-za-z_][a-za-z0-9_]*\(?/');; RE2=prxparse('/^(\* %\* \/\*).+$/'); if prxmatch(re1, CODELINE) and ^ prxmatch(re2, cats(codeline)); proc sort nodupkey; by PATH FNAME CODELINE; data temp02; set temp01; length &type $32767; retain RE; 3

if _N_=1 then RE=%if &type=input %then prxparse("s/.*?(?:(?:&kword)\.(\w+))?/\1 /i"); %else prxparse("s/.*?(?:%(&mlist)\b)?/\1 /i");; &type=compbl(prxchange(re, -1, CODELINE)); &type=prxchange('s/ /, /o', -1, cats(&type)); data &type.s; set temp02; by PATH FNAME; length &type.s $32767; retain &type.s; if first.fname then &type.s=cats(&type); else &type.s=catx(", ", &type.s, &type); retain RE1 RE2; if _N_=1 then do; RE1=prxparse('s/(\b.+?\b)(,\s.*?)(\b\1+\b)/\2\3/i'); RE2=prxparse('/(\b.+?\b)(,\s.*?)(\b\1+\b)/i'); if last.fname; /*Remove repeated values*/ do i=1 to 100; &type.s=prxchange(re1, -1, cats(&type.s)); if not prxmatch(re2, cats(&type.s)) then leave; %if &type=input %then &type.s=prxchange('s/' cats(scan(fname, 1, '.')) '//i', -1, cats(&type.s));; &type.s=prxchange('s/(, )+/, /o', -1, cats(&type.s)); &type.s=prxchange('s/^(, )+ (,\s?)+$//o', -1, cats(&type.s)); %if &type=macro %then %do; if ^missing(&type.s) then do; &type.s=prxchange('s/, $/.sas, /o', -1, cats(&type.s)); &type.s=prxchange('s/,$//o', -1, cats(&type.s)); % keep FNAME &type.s; %mend ftype; /*Dataset name*/ %ftype(type=input) /*List all macros within macro library*/ filename macro pipe "find &m_path -name '*.sas'"; data mlib; infile macro truncover lrecl=1024; length PATH $1024 FNAME $200; input PATH 1-1024; FNAME=prxchange("s/(.+)\/(.+?).sas/\2/o", 1, PATH); /*Close the pipe*/ filename macro clear; /*Create macros list within macro library */ proc sql noprint; 4

select FNAME into :mlist separated by ' ' from mlib ; quit; /*Macros name*/ %ftype(type=macro) /*Combine all*/ proc sql; create table final as select a.fname 'Program name', INPUTS 'Inputs', MACROS 'Macros used' from code a left join inputs b on a.fname=b.fname left join macros c on a.fname=c.fname ; quit; /*Produce report*/ ods path work(update) sashelp.tmplmst(read); ods listing close; ods tagsets.excelxp file="&o_path.program inventory.xls" style=htmlblue options(frozen_headers = '1' autofilter = 'all' sheet_name = 'Inputs and macros used' absolute_column_width = '15, 50, 50' ); proc print data=final label noobs; ods tagsets.excelxp close; ods listing; %mend extdsnman; To include this macro in your SAS program, just run the following piece of code: filename mac temp; proc http url="https://raw.githubusercontent.com/xianhuazeng/pharmasug/master/2018/extd snman.sas" method="get" out=mac; filename code temp; data _null_; file code; infile mac; input; put _infile_; 5

%inc code; OUTPUT We use ODS TAGSETS.EXCELXP to produce output in Excel. A part of program inventory is shown below: Figure 2. Program Inventory CONCLUSION The macro described in this paper is a very useful tool which can greatly improve the efficiency of creating program inventory for ADRG. The usage of FOPEN/FREAD/FGET functions in this macro can also be used for other cases where retrieving multiple files, such as QC compare outputs and SAS log files. REFERENCE PhUSE CSS Working Group. Analysis Data Reviewer s Guide (ADRG), v1.1. Available at http://www.phusewiki.org/wiki/images/d/df/adrg_v1.1_2015-01-26.zip Jeff Avallone. Regexper. Available at http://www.regexper.com/. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Xianhua Zeng Enterprise: PAREXEL International Address: 9F & Unit A/B/C 10F, No.506, Shangcheng Road, Pudong District, Shanghai, China, 200120 Work Phone: +86 21-51118305 Fax: +86 21 61609196 E-mail: Allen.Zeng@PAREXLE.com Web: http://www.parexel.com/ http://www.xianhuazeng.com/ SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 6