Prove QC Quality Create SAS Datasets from RTF Files Honghua Chen, OCKHAM, Cary, NC

Similar documents
A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

Run your reports through that last loop to standardize the presentation attributes

Quality Control of Clinical Data Listings with Proc Compare

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

Essential ODS Techniques for Creating Reports in PDF Patrick Thornton, SRI International, Menlo Park, CA

A Practical and Efficient Approach in Generating AE (Adverse Events) Tables within a Clinical Study Environment

Procedure for Stamping Source File Information on SAS Output Elizabeth Molloy & Breda O'Connor, ICON Clinical Research

ODS/RTF Pagination Revisit

Tips and Tricks to Create In-text Tables in Clinical Trial Repor6ng Using SAS

How to Keep Multiple Formats in One Variable after Transpose Mindy Wang

There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA

Data Quality Review for Missing Values and Outliers

A Macro to Create Program Inventory for Analysis Data Reviewer s Guide Xianhua (Allen) Zeng, PAREXEL International, Shanghai, China

PharmaSUG China 2018 Paper AD-62

Clinical Data Visualization using TIBCO Spotfire and SAS

Post-Processing.LST files to get what you want

ODS DOCUMENT, a practical example. Ruurd Bennink, OCS Consulting B.V., s-hertogenbosch, the Netherlands

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

IT S THE LINES PER PAGE THAT COUNTS Jonathan Squire, C2RA, Cambridge, MA Johnny Tai, Comsys, Portage, MI

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

Paper An Automated Reporting Macro to Create Cell Index An Enhanced Revisit. Shi-Tao Yeh, GlaxoSmithKline, King of Prussia, PA

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

LST in Comparison Sanket Kale, Parexel International Inc., Durham, NC Sajin Johnny, Parexel International Inc., Durham, NC

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

A Macro that can Search and Replace String in your SAS Programs

footnote1 height=8pt j=l "(Rev. &sysdate)" j=c "{\b\ Page}{\field{\*\fldinst {\b\i PAGE}}}";

Automating Comparison of Multiple Datasets Sandeep Kottam, Remx IT, King of Prussia, PA

PharmaSUG Paper PO12

SAS Graphs in Small Multiples Andrea Wainwright-Zimmerman, Capital One, Richmond, VA

Presentation Quality Bulleted Lists Using ODS in SAS 9.2. Karl M. Kilgore, PhD, Cetus Group, LLC, Timonium, MD

Keeping Track of Database Changes During Database Lock

3N Validation to Validate PROC COMPARE Output

A Macro to replace PROC REPORT!?

A Macro To Generate a Study Report Hany Aboutaleb, Biogen Idec, Cambridge, MA

PDF Multi-Level Bookmarks via SAS

Developing Data-Driven SAS Programs Using Proc Contents

Multiple Graphical and Tabular Reports on One Page, Multiple Ways to Do It Niraj J Pandya, CT, USA

Advanced Visualization using TIBCO Spotfire and SAS

The Dataset Diet How to transform short and fat into long and thin

PharmaSUG Paper TT10 Creating a Customized Graph for Adverse Event Incidence and Duration Sanjiv Ramalingam, Octagon Research Solutions Inc.

SAS Drug Development Program Portability

An Easy Way to Split a SAS Data Set into Unique and Non-Unique Row Subsets Thomas E. Billings, MUFG Union Bank, N.A., San Francisco, California

Compute; Your Future with Proc Report

Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC

PharmaSUG Paper AD21

Chapter 2: Getting Data Into SAS

Use That SAP to Write Your Code Sandra Minjoe, Genentech, Inc., South San Francisco, CA

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

Utilizing SAS for Cross- Report Verification in a Clinical Trials Setting

Validation Summary using SYSINFO

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95

Automated Macros to Extract Data from the National (Nationwide) Inpatient Sample (NIS)

TLFs: Replaying Rather than Appending William Coar, Axio Research, Seattle, WA

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

Moving Data and Results Between SAS and Excel. Harry Droogendyk Stratia Consulting Inc.

One Project, Two Teams: The Unblind Leading the Blind

Using V9 ODS LAYOUT to Simplify Generation of Individual Case Summaries Ling Y. Chen, Rho, Inc., Newton, MA

Exporting Variable Labels as Column Headers in Excel using SAS Chaitanya Chowdagam, MaxisIT Inc., Metuchen, NJ

Summary Table for Displaying Results of a Logistic Regression Analysis

Using PROC SQL to Generate Shift Tables More Efficiently

Tales from the Help Desk 6: Solutions to Common SAS Tasks

Give me EVERYTHING! A macro to combine the CONTENTS procedure output and formats. Lynn Mullins, PPD, Cincinnati, Ohio

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG

SAS Log Summarizer Finding What s Most Important in the SAS Log

%ANYTL: A Versatile Table/Listing Macro

General Methods to Use Special Characters Dennis Gianneschi, Amgen Inc., Thousand Oaks, CA

ABSTRACT INTRODUCTION TRICK 1: CHOOSE THE BEST METHOD TO CREATE MACRO VARIABLES

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

My Reporting Requires a Full Staff Help!

Reading in Data Directly from Microsoft Word Questionnaire Forms

SAS Clinical Data Integration Server 2.1

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System

Using Templates Created by the SAS/STAT Procedures

A Few Quick and Efficient Ways to Compare Data

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

Chapter 6: Modifying and Combining Data Sets

SAS PROGRAMMING AND APPLICATIONS (STAT 5110/6110): FALL 2015 Module 2

Untangling and Reformatting NT PerfMon Data to Load a UNIX SAS Database With a Software-Intelligent Data-Adaptive Application

Paper PO07. %RiTEN. Duong Tran, Independent Consultant, London, Great Britain

The Impossible An Organized Statistical Programmer Brian Spruell and Kevin Mcgowan, SRA Inc., Durham, NC

A Macro to Manage Table Templates Mark Mihalyo, Community Care Behavioral Health Organization, Pittsburgh, PA

PharmaSUG Paper AD09

Hash Objects for Everyone

Indenting with Style

3. Almost always use system options options compress =yes nocenter; /* mostly use */ options ps=9999 ls=200;

Statistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment

Plot Your Custom Regions on SAS Visual Analytics Geo Maps

T.I.P.S. (Techniques and Information for Programming in SAS )

SAS 101. Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23. By Tasha Chapman, Oregon Health Authority

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC

An Animated Guide: Proc Transpose

SAS Clinical Data Integration 2.4

A Useful Macro for Converting SAS Data sets into SAS Transport Files in Electronic Submissions

Making a List, Checking it Twice (Part 1): Techniques for Specifying and Validating Analysis Datasets

ODS or DDE for Data Presentation -- A Preliminary Comparison of Output from Different Sources John He, Cephalon, Inc.

Omitting Records with Invalid Default Values

PharmaSUG Paper PO22

PhUSE US Connect 2018 Paper CT06 A Macro Tool to Find and/or Split Variable Text String Greater Than 200 Characters for Regulatory Submission Datasets

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

Transcription:

Prove QC Quality Create SAS Datasets from RTF Files Honghua Chen, OCKHAM, Cary, NC ABSTRACT Since collecting drug trial data is expensive and affects human life, the FDA and most pharmaceutical company SOPs require all datasets and TLFs to be checked by independent secondary QC programmers. Sometimes, comparing hundreds or even thousands pages of tables and listings is tedious and consumes a huge amount of QC programmer s time. This paper outlines a process flow to replace a primary SAS program with ODS RTF statements, create a temporary SAS program, execute the program to create a temporary RTF file [3], extract data from that RTF file to create primary final SAS dataset, and do the proc compare with QC final SAS dataset. The listing produced from proc compare can be saved for auditing purposes. The process can improve QC performance, reduce validation processing and paperwork, and finally prove QC quality. The full program, sample input RTF file and output dataset are available as appendices. INTRODUCTION There are two types of RTF outputs existing in the pharmaceutical industry. The first one involves converting SAS output files (.lst) to RTF and doing some post-processing. There are various techniques available through internet such as out2rtf (search support.sas.com) created for this type of output by David Ward dating back to May, 1999. Variations of that macro have played important roles for automation of post processing for.lst files. The other one is called in-text RTF which is created from the SAS ODS RFT function. This two dimensional table format is preferred by medical writers because copy and pasting the table will not cause values to be shifted between columns when this process is applied. A SAS dataset is also a two-dimensional table, so extracting an RTF file to a SAS dataset seems like a logical choice. THE ORIGINAL PROC REPORT PROCESS A pre-process (%prtsetup) before the proc report and a post-process (%pageprt) after the proc report are required for most pharmaceutical reporting systems. The pre-process sets up the destination of the report, font, page layout, etc... While the post-process adds page numbers and formats to the report according to the company s standards. See the following: libname testdata "/u01/home/hchen/company/drugname/protocols"; %prtsetup; proc report data=testdata.final nowd spacing=1 split='*' headline; columns ('--' subjid bthdt age sex newrace ethnic height weight); %pageprt; define subjid / order width=10; define bthdt / display 'Birth Date' width=10; define age / display width=7 'Age*(years)'; define sex / display width=6 'Sex'; define newrace / display width=9 flow 'Race'; define ethnic / display width=10 flow 'Ethnicity'; define height / width=6 'Height*(cm)'; define weight / width=6 'Weight*(kg)'; title2 'Listing of demographics and baseline characteristics'; title3 'Full Analysis Set'; 1

THE MODIFIED PROC REPORT PROCESS Using Perl search and replace [1], we replace the pre-process (%prtsetup) with options and ods rtf to set up a new temporary destination and the simplest format for the rtf file. Then we post-process (%pageprt) with ods rtf close. See the following: %let pgm_folderx=/u01/home/hchen/company/drugname/protocols; %let slashx =/; libname testdata "/u01/home/hchen/company/drugname/protocols"; ods listing close; options nodate nonumber ORIENTATION=LANDSCAPE device=sasemf; ods rtf file="&pgm_folderx.&slashx.l-dm-temp.rtf"; proc report ods rtf close; ods listing; EXTRACTING RTF OUTPUT TO SAS DATASET We do not need to understand RTF tags [2] and RTF parse [5] to do the job. Simply looking through the text within the RTF file and eliminating all unrelated rows and transforming the file to create the primary final dataset with variables col1 to coln and titles, column header, and footnotes dataset is all that is needed for the new method. See appendices II (source code) from data rtf_temp1 to data rtf_temp4. COMPLETING THE TASK The QC program creates the QC final dataset containing variables col1 to coln, and compares it to the primary final dataset using PROC COMPARE. proc compare base=primary_final compare=qc_final listall; proc compare base=primary_tit_colhd_ft compare=qc_tit_colhd_ft listall; CONCLUSION We discussed a method of producing a dataset from an rtf file. While the program is created for one company, with little emphasis on modification, transplanting the program to work for many is an achievable and possible goal. Former Chinese leader Deng Xiaoping [4] uttered his most famous quotation: "I don't care if it's a white cat or a black cat. It's a good cat as long as it catches mice." This quote can be interpreted here to mean that being creative, more effective and serving the objectives of the client is more important than whether one follows traditional ideology in this instance. REFERENCES [1] Shuguang Zhang Use Perl Regular Expressions in SAS [2] Sean M. Burke The Universal Document Format RTF Pocket Guide [3] BIOGEN IDEC SMART System Users Guide [4] WIKIPEDIA Deng Xiaoping [5] Duong Tran %RTFparser CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Honghua Chen OCKHAM 8000 Regency Parkway, Suite 360 Cary, North Carolina, 27518 Phone: 4439380592 Email: hchen@ockham.com Web: www.ockham.com 2

ACKNOWLEDGMENTS I would like to thank the Biogen Idec programming team in RTP, NC for their helpful suggestions and assistance in testing the program presented in this paper. I would also like to thank Adam Gilbert and Juliet Allen for their encouragement and assistance in reviewing the macro. SAS is a registered trademark of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. DISCLAIMER All code contained in this paper is provided as an AS IS basis, without warranty. The author makes no representation, or warranty, either or implied, with respect to the programs, their quality, accuracy, or fitness for a specific purpose. Therefore, the author shall have no liability to you or any other person or entity with respect to any liability, loss, or damage caused or alleged to have been caused directly or indirectly by the programs provided in this paper. This includes, but is not limited to, interruption of service, loss of data, loss of profits, or consequential damages from the use of these programs. 3

APPENDIX I (RTF TABLE): APPENDIX II (SOURCE CODE): %macro get_final(inpgm_folder=,pgm_folder=,pgm_name=,s_string=, e_string=,debug=no); *** Output a simple Perl program to SAS dataset qc_ods_rft ***; options NOQUOTELENMAX; data qc_ods_rtf; length pgm_code $200; pgm_code='open (INFILE,"' "&inpgm_folder" '/' "&pgm_name" '.sas" ) or die "can not open the input file";'; pgm_code='open (OUTFILE,">' "&pgm_folder" '/' "&pgm_name" '-temp.sas" ) or die "can not open output file";'; pgm_code='select (OUTFILE);'; 4

pgm_code='while (<INFILE>){'; pgm_code='s/' "&s_string" '/ods listing close; options nodate nonumber' ' ORIENTATION=LANDSCAPE device=sasemf; '; pgm_code='ods rtf file="&pgm_folderx.' '&slashx.' "&pgm_name" '-temp.rtf";/g;'; pgm_code='s/' "&e_string" '/ods rtf close; ods listing; /g;'; pgm_code='print; }'; pgm_code='close (INFILE);'; pgm_code='close (OUTFILE);'; *** Call BIOGEN SMART utility macro putpgm to write out perl program qc_ods_rtf.sas ***; %let exe_lib =&pgm_folder/; %include "/biostats/macros/smart/putpgm.sas"; %putpgm(qc_ods_rtf); *** Execute the perl program qc_ods_rft.sas to create a modified primary program &pgm_name.-temp.sas ***; x "perl &pgm_folder/qc_ods_rtf.sas"; data temp000; length pgm_code $200; pgm_code='%let pgm_folderx=' "&pgm_folder" ';'; pgm_code='%let slashx =/;'; %putpgm(temp000); x "cat &pgm_folder./temp000.sas &pgm_folder./&pgm_name.-temp.sas > &pgm_folder./&pgm_name.-temp2.sas" *** Execute the modified primary program &pgm_name.-temp.sas ***; %include "&pgm_folder./&pgm_name.-temp2.sas"; *** Read in the rtf file &pgm_name.-temp.rtf created from &pgm_name.-temp.sas ***; data rtf_input; infile "&pgm_folder./&pgm_name.-temp.rtf" delimiter='00'x MISSOVER DSD lrecl=32767 firstobs=1; format f1 $500.; input f1 $ ; *** Extract text from cells ***; data rtf_temp1; set rtf_input; 5

length text $1000; *** Keep all cells from the table ***; if ( index(f1,'\cell}') or index(f1,'\row}') or index(f1,'\trowd\') ) > 0 ; *** Delete column titles ***; if ( index(f1,'\b\') ) = 0; *** Find start position of text ***; pos1 = index(f1,'{'); *** Find end position of text ***; pos2 = index(f1,'\cell}'); lengthx = pos2 - pos1-1; if pos1 ^= 0 and pos2 ^= 0 then do; if pos2 = pos1 + 1 then text = ' '; else text = substr(f1,pos1+1, lengthx); data rtf_input1; set rtf_input; length f1x $1000; retain delx 0 f1x ''; if (index(f1,'\b\') > 0 and index(f1,'\line}') > 0) then do; f1x = trim(left(f1)); delx = 1; delete; if delx = 1 and index(f1,'\line}') > 0 then do; f1x = trim(left(f1x)) trim(left(f1)); delete; if delx = 1 and index(f1,'\cell}') > 0 then do; f1x = trim(left(f1x)) trim(left(f1)); delx = 0; f1 = tranwrd(f1x,'{\line}',' '); f1 = tranwrd(f1,'\~',' '); f1x = ' '; data rtf_tit_foot1; set rtf_input1; length text $1000; retain group 0 ; if index(f1,'\bkmkend') > 0 then group = 2000; if index(f1,'\header') > 0 then group = 1000; if index(f1,'\footer') > 0 then group = 3000; group = group + 1; *** Keep all cells from the table ***; *** keep column titles ***; if ((index(f1,'\b\') and index(f1,'\cell}')) or index(f1,'\row}') or index(f1,'\trowd\') ) > 0 ; 6

*** Find start position of text ***; pos1 = index(f1,'{'); *** Find end position of text ***; pos2 = index(f1,'\cell}'); lengthx = pos2 - pos1-1; if pos1 ^= 0 and pos2 ^= 0 then do; if pos2 = pos1 + 1 then text = ' '; else text = substr(f1,pos1+1, lengthx); if text = '' or index(text,' ') > 0 then delete; proc sort nodupkey; by group text; data rtf_tit_foot2; set rtf_input; length text $1000; retain keepx 0 group 3000 ; *** keep footnotes created by macro setft ***; if index(f1,' ') > 0 and index(f1,'\line}') > 0 then do; keepx = 1; group = 3000; if keepx = 1 then group = group + 1; if keepx = 1 and index(f1,'\cell}') > 0 then keepx = 0; text = tranwrd(f1,'{\line}',' '); text = tranwrd(text,'\~',' '); if keepx = 0 then delete; if keepx = 1 and group = 3001 then delete; if text = '' then delete; proc sort nodupkey; by group text; *** Find the maximum column number ***; data rtf_temp2(drop= tot) tot_temp(keep = grp tot); set rtf_temp1; retain grp 0 col -1; if index(f1,'\trowd\') > 0 then do; grp = grp + 1; col = -1; col = col + 1; output rtf_temp2; if index(f1,'\row') > 0 then do; tot = col; output tot_temp; data rtf_temp3; 7

merge rtf_temp2(in=a) tot_temp(in=b); by grp; if a; proc sql noprint; select max(tot) into :max_tot from tot_temp; quit; %put ****&max_tot****; *** Delete titles and footnotes cells ***; data rtf_temp4; set rtf_temp3; if tot = &max_tot; if col = 0 or col = &max_tot then delete; *** create SAS dataset with variable col1 to coln ***; proc transpose data=rtf_temp4 out=rtf_temp5 prefix=col; by grp; id col; var text; *** Get the result ***; data primary_tit_colhd_ft; set rtf_tit_foot1 rtf_tit_foot2; if index(text,'source: ') > 0 then delete; keep text group; data primary_final; set rtf_temp5; *** delete rows if all cells are blank ***; length col $1000; col = %do i = 1 %to &max_tot - 2; trim(left(col&i)) %trim(left(col%eval(&max_tot - 1))); if col = '' then delete; drop grp _name_ col; %if %upcase(&debug)=no %then %do; *** Delete perl prigram qc_ods_rtf.sas ***; x "rm &pgm_folder./qc_ods_rtf.sas"; *** Delete the modified primary program &pgm_name.-temp.sas ***; x "rm &pgm_folder./&pgm_name.-temp.sas"; x "rm &pgm_folder./&pgm_name.-temp2.sas"; x "rm &pgm_folder./temp000.sas"; 8

*** Delete the rtf file &pgm_name-temp.rtf ***; x "rm &pgm_folder./&pgm_name.-temp.rtf"; proc datasets library=work memtype=data nolist nowarn; delete rtf_temp1 rtf_temp2 rtf_temp3 rtf_temp4 rtf_temp5 qc_ods_rtf tot_temp rtf_input rtf_input1 rtf_tit_foot1 rtf_tit_foot2 temp000; quit;; % %mend get_final; *** Call get_final macro to create SAS dataset PRIMARY_FINAL ***; %include "/u01/home/hchen/macros/get_final.sas"; %get_final(inpgm_folder=%str(/u01/home/hchen/company/drugname/protocols ), pgm_folder=%str(/u01/home/hchen/company/drugname/protocols), pgm_name=l-dm, s_string=%nrstr(%prtsetup;), e_string=%nrstr(%pageprt;),debug=yes); * Use the backslash (\) character to escape any type of character ; * that might interfere with perl code. Use '%\*' if you want to ; * replace '%*' in SAS code; %get_final(inpgm_folder=%str(/u01/home/hchen/company/drugname/protocols ), pgm_folder=%str(/u01/home/hchen/company/drugname/protocols), pgm_name=l-dm2, s_string=%nrstr(%\*prtsetup;), e_string=%nrstr(%\*pageprt;),debug=no); *** Begin of QC program ***; *** End of QC program ***; data qc_final; length col1-col8 $1000; set final; col1 = subjid; col2 = analset; col2 = brthdtc; col3 = agex; col4 = sex; col5 = racex; col6 = ethnic; col7 = htx; col8 = wtx; keep col1-col8; 9

APPENDIX III (SAS DATASETS): 10