PharmaSUG Paper TT11

Similar documents
Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

PharmaSUG Paper PO12

An Efficient Tool for Clinical Data Check

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

A SAS Macro to Create Validation Summary of Dataset Report

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

A Tool to Compare Different Data Transfers Jun Wang, FMD K&L, Inc., Nanjing, China

Amie Bissonett, inventiv Health Clinical, Minneapolis, MN

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

Developing Data-Driven SAS Programs Using Proc Contents

Run your reports through that last loop to standardize the presentation attributes

One Project, Two Teams: The Unblind Leading the Blind

Statistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

Keeping Track of Database Changes During Database Lock

Data Integrity through DEFINE.PDF and DEFINE.XML

A Breeze through SAS options to Enter a Zero-filled row Kajal Tahiliani, ICON Clinical Research, Warrington, PA

Real Time Clinical Trial Oversight with SAS

Clinical Data Visualization using TIBCO Spotfire and SAS

%Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables

Reducing SAS Dataset Merges with Data Driven Formats

Taming a Spreadsheet Importation Monster

Extending the Scope of Custom Transformations

PharmaSUG Paper PO10

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

PharmaSUG China. Systematically Reordering Axis Major Tick Values in SAS Graph Brian Shen, PPDI, ShangHai

Missing Pages Report. David Gray, PPD, Austin, TX Zhuo Chen, PPD, Austin, TX

Using SAS/SCL to Create Flexible Programs... A Super-Sized Macro Ellen Michaliszyn, College of American Pathologists, Northfield, IL

Cover the Basics, Tool for structuring data checking with SAS Ole Zester, Novo Nordisk, Denmark

%ANYTL: A Versatile Table/Listing Macro

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

Quality Control of Clinical Data Listings with Proc Compare

Reproducibly Random Values William Garner, Gilead Sciences, Inc., Foster City, CA Ting Bai, Gilead Sciences, Inc., Foster City, CA

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

Guide Users along Information Pathways and Surf through the Data

Hypothesis Testing: An SQL Analogy

Get Started Writing SAS Macros Luisa Hartman, Jane Liao, Merck Sharp & Dohme Corp.

Applications Development. Paper 38-28

PhUSE US Connect 2018 Paper CT06 A Macro Tool to Find and/or Split Variable Text String Greater Than 200 Characters for Regulatory Submission Datasets

Automate Clinical Trial Data Issue Checking and Tracking

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

PharmaSUG Paper CC02

Using a Control Dataset to Manage Production Compiled Macro Library Curtis E. Reid, Bureau of Labor Statistics, Washington, DC

Application Interface for executing a batch of SAS Programs and Checking Logs Sneha Sarmukadam, inventiv Health Clinical, Pune, India

PharmaSUG China Paper 059

Program Validation: Logging the Log

PharmaSUG China

SDTM Attribute Checking Tool Ellen Xiao, Merck & Co., Inc., Rahway, NJ

Cleaning up your SAS log: Note Messages

PharmaSUG Paper SP04

Using SAS Macro to Include Statistics Output in Clinical Trial Summary Table

Validating Analysis Data Set without Double Programming - An Alternative Way to Validate the Analysis Data Set

What Do You Mean My CSV Doesn t Match My SAS Dataset?

SAS Clinical Data Integration 2.6

Validation Summary using SYSINFO

Facilitate Statistical Analysis with Automatic Collapsing of Small Size Strata

Internal Consistency and the Repeat-TFL Paradigm: When, Why and How to Generate Repeat Tables/Figures/Listings from Single Programs

How to write ADaM specifications like a ninja.

PharmaSUG Paper DS06 Designing and Tuning ADaM Datasets. Songhui ZHU, K&L Consulting Services, Fort Washington, PA

Utilizing SAS for Cross- Report Verification in a Clinical Trials Setting

A Quick and Gentle Introduction to PROC SQL

TLF Management Tools: SAS programs to help in managing large number of TLFs. Eduard Joseph Siquioco, PPD, Manila, Philippines

PhUSE US Connect 2019

Step through Your DATA Step: Introducing the DATA Step Debugger in SAS Enterprise Guide

Pharmaceuticals, Health Care, and Life Sciences

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95

Not Just Merge - Complex Derivation Made Easy by Hash Object

There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA

Advanced Visualization using TIBCO Spotfire and SAS

An Efficient Solution to Efficacy ADaM Design and Implementation

Planning to Pool SDTM by Creating and Maintaining a Sponsor-Specific Controlled Terminology Database

Customized Flowcharts Using SAS Annotation Abhinav Srivastva, PaxVax Inc., Redwood City, CA

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

ABSTRACT INTRODUCTION MACRO. Paper RF

An Alternate Way to Create the Standard SDTM Domains

An Automation Procedure for Oracle Data Extraction and Insertion

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

Power Data Explorer (PDE) - Data Exploration in an All-In-One Dynamic Report Using SAS & EXCEL

A Practical and Efficient Approach in Generating AE (Adverse Events) Tables within a Clinical Study Environment

Creating output datasets using SQL (Structured Query Language) only Andrii Stakhniv, Experis Clinical, Ukraine

Unit-of-Analysis Programming Vanessa Hayden, Fidelity Investments, Boston, MA

One SAS To Rule Them All

How to validate clinical data more efficiently with SAS Clinical Standards Toolkit

Making a List, Checking it Twice (Part 1): Techniques for Specifying and Validating Analysis Datasets

SDTM Automation with Standard CRF Pages Taylor Markway, SCRI Development Innovations, Carrboro, NC

How Managers and Executives Can Leverage SAS Enterprise Guide

The Proc Transpose Cookbook

PharmaSUG Paper TT10 Creating a Customized Graph for Adverse Event Incidence and Duration Sanjiv Ramalingam, Octagon Research Solutions Inc.

LST in Comparison Sanket Kale, Parexel International Inc., Durham, NC Sajin Johnny, Parexel International Inc., Durham, NC

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

Checking for Duplicates Wendi L. Wright

Easy CSR In-Text Table Automation, Oh My

PharmaSUG Paper AD03

Making the most of SAS Jobs in LSAF

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

Creating Macro Calls using Proc Freq

Data Quality Review for Missing Values and Outliers

When Powerful SAS Meets PowerShell TM

Matt Downs and Heidi Christ-Schmidt Statistics Collaborative, Inc., Washington, D.C.

Transcription:

PharmaSUG 2014 - Paper TT11 What is the Definition of Global On-Demand Reporting within the Pharmaceutical Industry? Eric Kammer, Novartis Pharmaceuticals Corporation, East Hanover, NJ ABSTRACT It is not uncommon in the pharmaceutical industry to have standardized reporting for data management cleaning activities or clinical review. However, certain steps have to be taken into consideration when programming standardized reports, so they run properly for all studies. One particular troublesome area is when the global reports are run on-demand by non-programmers. As new data becomes available, original test cases that were used could be different, and the programs could fail. This paper will identify techniques to assist programmers on what to do when developing code, so programs can run successfully when unexpected surprises with data occur running SAS reports in an on-demand environment and give the customers information to assist in understanding the results. INTRODUCTION Global on-demand data cleaning reports are reports that can be run on any study and most likely by a nonprogrammer. Traditional programming allows a programmer to take SAS programs and revise it according to the study needs. This is true for program templates that are copied to one s study and then changed in order to get them running properly. Most programmers have stable or non-changing data for analyses from snap shots, interim locks, or final database locks. A global on-demand report for data cleaning will have different data every day. This can even include no data if run early in a study, or different data, as it is being refreshed in the database. The data can even be missing or bad data if run early in the data cleaning process. If a SAS error occurs, you do not have the luxury of debugging or looking at the SAS log, as the messages will occur right in front of the user s eyes and even possibly fail. This can be embarrassing and cause unnecessary headaches. A global on-demand data cleaning report will be used by a non-programmer and run on current data during the course of the trial. These users will send the reports to clinical or other data managers in the process of cleaning data assuming the results are what should be expected. A similar idea is defensive programming in trying to prevent issues that will occur in someone s program but in a global environment, and because non-programmers involved have to be taken to an even higher extreme. Types of issues One issue with user driven reports is when the report shows that there are no exceptions. One way of assisting the user is to show the number of records processed vs. number of occurrences, as it is important when reviewing reports. As an example, a user might run an exception report and see No observations found. However, is this a false positive; where the reason there are no records found is because there is actually no data to support the condition in the final dataset? One way to show if there was nothing to process is to give a message when there are no observations found in a report. %macro noreport(in=); /*********************************/ /* Check for zero observations */ /* in the dataset to be reported */ /*********************************/ DATA _null_; CALL SYMPUT('_nobs',LEFT(PUT(nobs,best.))); STOP; SET &in nobs=nobs; %IF &_nobs EQ 0 %THEN %DO; DATA noobs; LABEL text='***'; text='***************************************************'; OUTPUT; text='*** There are no observations for this report ***'; OUTPUT; 1

text='***************************************************'; OUTPUT; PROC PRINT LABEL noobs; %END; %mend noreport; Another problem when running a report against a SAS dataset is: has this dataset been created yet or did the user run the report too early in the data cleaning process? Does the file area being analyzed exist in the SAS environment? If the data does exist, when was the last data updated or created? Check if the data area was created. %global yes_no; %let yes_no=no; %macro doit(myfiler=); %if %sysfunc(fileexist(&myfiler)) %then %let yes_no=yes; %else %put The SAS file &myfiler does not exist.; %mend; Showing the number of records processed gives a count on data available for the report, and if the report showed that no observations existed, can prove there was data to be used for the report. The number of records processed allows the user to review the information. %macro numobs(in=); /******************************************************/ /* shows number of records processed in a dataset */ /*****************************************************/ proc contents data = &in out = counts noprint; set counts end=eof; if eof then do; x=compress(put(nobs,best.)); put "**************************************************************"; put "***" nobs "observations processed for dataset &in ***********"; put "**************************************************************"; end; %mend numobs; Show the date and time of the data created and number of observations. 2

%macro doobs(in=); proc contents data = &in out=newdms noprint; set newdms end=eof; if eof then do; call symput('crddate',put(crdate,datetime20.)); call symput('nobs',compress(put(nobs,best10.))); end; put "***********************************************"; put "&nobs observations processed for dataset &in "; put "Data was created on &crddate ******"; put "***********************************************"; %mend; %doobs(in=dmcount); Alternatively, a variable that you are analyzing might be completely empty or does not contain the data it is supposed to, such as, if derived database information is still not available (e.g. age between 0-30) or only having certain lab categories such as chemistry, hematology, or urinalysis. In this case, there are records processed, but there is no data to support the algorithm as possibly the derivation was not available at the time of the data update, or only chemistry and hematology exist. The user might even run the program on another study that they are assigned to by mistake thinking the results are acceptable. A frequency or range display can be used if data is presented showing the user that no records met the condition, and what data actually was available for the report. The extra benefit of having user specified parameters is that an opposite condition can be used for test purposes, which will prove to the user that the results given are true but not currently available with existing data (e.g If age < 18 vs. if age < 50). The following data was available for AGE, but no data was found to meet the conditions of the report or for the lab category, and no values were found for urinalysis. AGEDRV. Lab Category 0 Biochemistry 30 Hematology 31 44 50 59 61 101 Do not assume as it is best to pass variables or conditions using parameters and filters from a driver program, or have some metadata approach to use the variables that apply. Problematic: proc sort data=data_s.lb(keep= USUBJID LLBCAT) out=lab_all; where LLBCAT="HEMATOLOGY"; by USUBJID; Better; 3

%let vars = USUBJID LLBCAT; %let cond = LLBCAT="HEMATOLOGY"; %let dsn = lb; %let byvar = USUBJID; proc sort data=data_s.&dsn (keep=&vars) out=lab_all; where &cond; by &byvar; While it is very efficient to only keep those variables required, if you do not have some method ahead of time of checking to make sure the variables exist then the report could fail. Databases and standards change, so one needs to find ways of making the programs flexible, so they can be adaptable to new standards to avoid error messages. ERROR: The variable DOSADM in the DROP, KEEP, or RENAME list has never been referenced. If considering only a unit of g/dl then if this unit does not appear, a message should be printed displaying the values found (e.g. g/dl, g/l), so the user knows that there were values found outside of the consideration and a footnote or title should be given to indicate Note: Only values of g/l were considered for the program. 4

%macro novaluec(indsn=,invar=); /*********************************************************/ /* shows the range of values for character values */ /*****************************************************/ put "*************************************************************"; put "* No Data selected for &invar from dataset &indsn "; put "** Here are values collected for variable &invar ************"; put "*************************************************************"; proc freq data = &indsn; tables &invar; %mend novaluec; %macro novaluen(indsn=,invar=); /*********************************************************/ /* shows the values for a numerical result */ /*****************************************************/ proc means data = &indsn noprint; var &invar; output out = meany min=min max=max; set meany; put "****************************************************************"; put "* No Data selected for &invar from dataset &indsn "; put "Records for &invar-->mimimum=" min " Maximum= " max "from &indsn"; put "****************************************************************"; %mend novaluen; %macro noreslt(theinn=,thevar=); /******************************************************/ /* if no data supported algorithm then display the results */ /*****************************************************/ proc contents data = &theinn out=newdms noprint; %let charnum=0; set newdms; if upcase(name)=upcase("&thevar") then call symput('charnum',put(type,1.)); %if &charnum=2 %then %do; %novaluec(indsn=&theinn, invar=&thevar); %else %if &charnum=1 %then %do; %novaluen(indsn=&theinn, invar=&thevar); %else %do; %put Note: The Variable name was not found on Dataset &theinn Variable &thevar; %mend; 5

Many times when an on-demand report is run, a file is created from SAS, either as a SAS listing, Excel, or pdf. If a file is created for later review, the description, the user name, date, time with seconds have to be included, otherwise, if two users run the report accidently at the same time, you will have interference from the filenames and can actually lock the on-demand program while being executed. The naming convention is also helpful if you are creating an audit trail of the report usage. Each of these outputs poses problems especially if large record sizes are generated, which are difficult to work with. Simple messages such as "warning it could take a while for the Excel sheet to open, as there are >50,000 rows of data present maybe rerun and subset your data. Such a warning could then make the user suspect that something is wrong with the data or even the report output. If there is an issue, it might be best to have the program stop and give a message to the user that the data did not support the current condition or something went wrong. How will the user know the program failed and did not run? One important feature is to actually print a message in the output area to let the user know that the program started, processed a certain amount of data, finished executing, and is in the final stages of processing. These messages will actually help the programmer debug any issues from the program, and facilitate a more accurate review of the output. If the program completes successfully, summarize the various messages that occur running SAS programs (e.g. warnings, uninitialized, important notes), as non-programmers will not understand these. 2 OCCURRENCES OF POSSIBLE ERRORS FOUND 00:26 Thursday, March 27,2014 Line Description 6 WARNING: Unable to copy SASUSER registry to WORK registry. 1078 WARNING: Variable DOSADM is uninitialized Users or programmers can make mistakes in assumptions, so it is important to decide with upper or lower case for character data and show the assumptions made to produce the report. However, if the program assumes a certain entry, then a message should be given to let the user know that an expectation was not met. Obviously, this cannot be done on each field; just key conditions. As mentioned, another very important feature of assisting users and meeting the requirements of global reports is having user entered prompts that actually can control running the program. This can give flexibilities in creating subset reports such as baselines and lab parameters. This can result in less data being printed or adding important considerations such as dates, patients, and/or data milestone indications. Baseline values could be visit 2 or 3, are good examples that change from study to study and are needed sometimes in order for reports to run properly. Variables might also change to indicate specifics (e.g. total daily dose, dose administered vs. actual dose). Optional variables vs. mandatory variables, and then the user who knows the study better can assess what is needed. In this way, user parameters can control certain elements of your global reports. Here is an example of code for a dataset when the variable or dataset selected does not exist. 6

%macro novardsn(invar=,indsn=,liber=); proc contents data = &liber.._all_ out = &liber(keep=memname name) noprint; proc sort nodupkey data=&liber; by memname name; data varfound dsnfound; set &liber; if trim(upcase(name)) = trim(upcase("&invar")) then output varfound; if trim(upcase(memname)) = trim(upcase("&indsn")) then output dsnfound; %let nbs=0; %let dsn=0; set varfound end=eof; if eof then call symput('nbs',put(_n_,10.)); set dsnfound end=eof; if eof then call symput('dsn',put(_n_,10.)); %if &dsn=0 %then %do; put "***Dataset required &indsn not found***"; %else %if &nbs=0 %then %do; put "***Variable &invar not found on dataset &indsn***"; %mend; The report, when run by the user, should give a complete history of what happened at that point in the time, as each day, the data will be different from the database. The data might be extracted to SAS datasets or taken directly from an Oracle database. Is the output draft or production, time and date run, location of report in the programming environment, name of the program, study that was processed, raw files or analysis files, user that ran the report, number of records processed, date of the last update to the database to show how recent data is, was data available for the variables, is the dataset or variable required found, and assumptions such as filters or parameters? All of the concepts considered in this paper are not part of the main body of the program but added separately. As mentioned, there can be user entered parameters before the program begins execution (e.g. baseline visit numbers) or a driver file at the beginning or end of the program whichever makes sense, and this can control the program, so a minor change does not require change control or additional documentation. 7

CONCLUSION Data cleaning reports pose special problems when run by non-programmers on-demand, as new data arrives on an ongoing basis. It is important to take special steps while programming to let the users know why a particular report might have produced either no results or different results than expected. Using basic SAS functionality, a programmer can present messages to the user giving answers as to why report output was produced. Messages are not only helpful to the user, but if a question is asked; a programmer can then assist the user as why the report behaved the way it did. The same concepts can be applied to each report, so extra programming or change control is not required. Basically, an entire history of the report regarding data can be displayed which allows any data review report to be more functional, and in the end, having fewer questions by the user, management, and the reviewer of the results of the reports. This includes showing the user that the report has begun, various steps of execution, giving data results as a summary to clarify the report or no data results, and at the end, summarizing the status and any error messages of the report. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Eric Kammer Enterprise: Novartis Pharmaceuticals Address: One Health Plaza City, State ZIP: East Hanover, NJ 07936 E-mail: eric.kammer@novartis.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 8