Quality Control of Clinical Data Listings with Proc Compare

Similar documents
Example1D.1.sas. * Procedures : ; * 1. print to show the dataset. ;

The Art of Defensive Programming: Coping with Unseen Data

Ready To Become Really Productive Using PROC SQL? Sunil K. Gupta, Gupta Programming, Simi Valley, CA

Using a HASH Table to Reference Variables in an Array by Name. John Henry King, Hopper, Arkansas

Introduction to SQL 4/24/2017. University of Iowa SAS Users Group. 1. Introduction and basic uses 2. Joins and Views 3. Reporting examples

ODS in an Instant! Bernadette H. Johnson, The Blaze Group, Inc., Raleigh, NC

A Practical Introduction to SAS Data Integration Studio

Getting Up to Speed with PROC REPORT Kimberly LeBouton, K.J.L. Computing, Rossmoor, CA

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC

Don t Get Blindsided by PROC COMPARE Joshua Horstman, Nested Loop Consulting, Indianapolis, IN Roger Muller, Data-to-Events.

Not Just Merge - Complex Derivation Made Easy by Hash Object

A Macro to replace PROC REPORT!?

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

PRXChange: Accept No Substitutions Kenneth W. Borowiak, PPD, Inc.

Procedure for Stamping Source File Information on SAS Output Elizabeth Molloy & Breda O'Connor, ICON Clinical Research

Prove QC Quality Create SAS Datasets from RTF Files Honghua Chen, OCKHAM, Cary, NC

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

ECLT 5810 SAS Programming - Introduction

PharmaSUG Paper PO10

Essential ODS Techniques for Creating Reports in PDF Patrick Thornton, SRI International, Menlo Park, CA

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

KEYWORDS Metadata, macro language, CALL EXECUTE, %NRSTR, %TSLIT

Multiple Facts about Multilabel Formats

Countdown of the Top 10 Ways to Merge Data David Franklin, Independent Consultant, Litchfield, NH

The Power of Combining Data with the PROC SQL

SESUG 2014 IT-82 SAS-Enterprise Guide for Institutional Research and Other Data Scientists Claudia W. McCann, East Carolina University.

Making a SYLK file from SAS data. Another way to Excel using SAS

CDISC Variable Mapping and Control Terminology Implementation Made Easy

PharmaSUG Paper TT11

Omitting Records with Invalid Default Values

Demystifying PROC SQL Join Algorithms

Anatomy of a Merge Gone Wrong James Lew, Compu-Stat Consulting, Scarborough, ON, Canada Joshua Horstman, Nested Loop Consulting, Indianapolis, IN, USA

Getting it Done with PROC TABULATE

PharmaSUG China 2018 Paper AD-62

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

ODS/RTF Pagination Revisit

Using a Fillable PDF together with SAS for Questionnaire Data Donald Evans, US Department of the Treasury

Journey to the center of the earth Deep understanding of SAS language processing mechanism Di Chen, SAS Beijing R&D, Beijing, China

100 THE NUANCES OF COMBINING MULTIPLE HOSPITAL DATA

Application of Modular Programming in Clinical Trial Environment Mirjana Stojanovic, CALGB - Statistical Center, DUMC, Durham, NC

Taming a Spreadsheet Importation Monster

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

Super boost data transpose puzzle

An Introduction to Compressing Data Sets J. Meimei Ma, Quintiles

An Animated Guide: Proc Transpose

Statistics, Data Analysis & Econometrics

Utilizing the VNAME SAS function in restructuring data files

%ANYTL: A Versatile Table/Listing Macro

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

One Project, Two Teams: The Unblind Leading the Blind

A SAS Macro to Generate Caterpillar Plots. Guochen Song, i3 Statprobe, Cary, NC

Summary Table for Displaying Results of a Logistic Regression Analysis

PharmaSUG China Paper 059

Because We Can: Using SAS System Tools to Help Our Less Fortunate Brethren John Cohen, Advanced Data Concepts, LLC, Newark, DE

Making a List, Checking it Twice (Part 1): Techniques for Specifying and Validating Analysis Datasets

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.

Paper # Jazz it up a Little with Formats. Brian Bee, The Knowledge Warehouse Ltd

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

Are you Still Afraid of Using Arrays? Let s Explore their Advantages

Patricia Guldin, Merck & Co., Inc., Kenilworth, NJ USA

Exporting Variable Labels as Column Headers in Excel using SAS Chaitanya Chowdagam, MaxisIT Inc., Metuchen, NJ

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

Macro Quoting: Which Function Should We Use? Pengfei Guo, MSD R&D (China) Co., Ltd., Shanghai, China

SQL Metadata Applications: I Hate Typing

Now That You Have Your Data in Hadoop, How Are You Staging Your Analytical Base Tables?

Square Peg, Square Hole Getting Tables to Fit on Slides in the ODS Destination for PowerPoint

It s Proc Tabulate Jim, but not as we know it!

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

Arthur L. Carpenter California Occidental Consultants, Oceanside, California

LST in Comparison Sanket Kale, Parexel International Inc., Durham, NC Sajin Johnny, Parexel International Inc., Durham, NC

Automating the Production of Formatted Item Frequencies using Survey Metadata

Beginner Beware: Hidden Hazards in SAS Coding

SAS ENTERPRISE GUIDE USER INTERFACE

PROC CATALOG, the Wish Book SAS Procedure Louise Hadden, Abt Associates Inc., Cambridge, MA

How to Keep Multiple Formats in One Variable after Transpose Mindy Wang

Tales from the Help Desk 6: Solutions to Common SAS Tasks

A Tool to Compare Different Data Transfers Jun Wang, FMD K&L, Inc., Nanjing, China

Same Data Different Attributes: Cloning Issues with Data Sets Brian Varney, Experis Business Analytics, Portage, MI

Automating Comparison of Multiple Datasets Sandeep Kottam, Remx IT, King of Prussia, PA

Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

Moving Data and Results Between SAS and Excel. Harry Droogendyk Stratia Consulting Inc.

A Methodology for Truly Dynamic Prompting in SAS Stored Processes

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG

Using SAS Enterprise Guide to Coax Your Excel Data In To SAS

Multiple Graphical and Tabular Reports on One Page, Multiple Ways to Do It Niraj J Pandya, CT, USA

Interactive Programming Using Task in SAS Studio

Paper S Data Presentation 101: An Analyst s Perspective

Using PROC PLAN for Randomization Assignments

What Do You Mean My CSV Doesn t Match My SAS Dataset?

Paper CT-16 Manage Hierarchical or Associated Data with the RETAIN Statement Alan R. Mann, Independent Consultant, Harpers Ferry, WV

PROC REPORT Basics: Getting Started with the Primary Statements

Merging Data Eight Different Ways

The Benefits of Traceability Beyond Just From SDTM to ADaM in CDISC Standards Maggie Ci Jiang, Teva Pharmaceuticals, Great Valley, PA

PROBLEM FORMULATION, PROPOSED METHOD AND DETAILED DESCRIPTION

Displaying Multiple Graphs to Quickly Assess Patient Data Trends

What s New in SAS Studio?

Paper HOW-06. Tricia Aanderud, And Data Inc, Raleigh, NC

SAS Studio: A New Way to Program in SAS

Amie Bissonett, inventiv Health Clinical, Minneapolis, MN

Transcription:

ABSTRACT Quality Control of Clinical Data Listings with Proc Compare Robert Bikwemu, Pharmapace, Inc., San Diego, CA Nicole Wallstedt, Pharmapace, Inc., San Diego, CA Checking clinical data listings with proc compare is a quick way to validate the order, completeness, and content of the listings. This is valuable and efficient when working with hundreds or thousands of pages per listing. Here we introduce the setup steps which need to be taken by both developer and quality control (QC) programmer, and suggested steps a QC programmer can take when validating large listings. INTRODUCTION The QC process for listings require verifying the number of records, assuring the accuracy of the data sorting order and content, and checking the formats, column headers, titles, and footnotes. The first two aspects can be accomplished using the steps described in this paper; the third will still have to be done by manual review. Reviewing the first page of the output mostly accomplishes the third QC aspect. As a result, the use of proc compare for the remaining data verification has the potential to dramatically reduce the QC effort for listings. In this paper, we will review the roles for the developer and QC programmer and finish with an example using SASHELP.CLASS dataset. LISTING DEVELOPER S ROLE To implement these steps the developer of the listings will first need to define a folder location for the saved dataset using the LIBNAME statement. Second, the developer will have to save the dataset outputted in the listing using the new LIBNAME. This can be accomplished in two ways. One is to save the dataset with a data step, seen in Output 1. libname savelist "G:/dev/project4/listings/qc"; data savelist.l_16_1_dm; set ae4; Output 1. Save Dataset with a Data Step The other way is to simultaneously save the dataset while outputting the listing using proc report s built in OUT statement. The statement can be added to the first line of the proc report, see Output 2 below. Please note, Output 2 is preferred as it ensures it is the most accurate reflection of the listing (additional data manipulation is needed when using this option; see Number 3 in the Strategies to Prevent Common Discrepancies section). libname savelist "G:/dev/project4/listings/qc"; proc report data=sashelp.class out=savelist.l_16_1_dm; column sex name age height weight; Output 2. Save Dataset with PROC REPROT s built-in OUT Statement QC PROGRAMMER S ROLE The QC programmer s role is to independently generate a dataset to compare to the developer s dataset with proc compare. Before we use proc compare, there are a few steps, discussed below, that should be taken to ensure the datasets can be compared. STEPS TO COMBAT COMMON DISCREPANCIES 1. Open the listing before creating the QC dataset to get an understanding of the source datasets and data sort or display ordering.

2. Check that the date of the developer s dataset is consistent with the version you want to compare, and use the access=read-only option in the LIBNAME statement to prevent overwriting the developer s dataset. 3. Delete all non-missing _BREAK_ rows from developer s dataset when produced by proc report s OUT option. These are either to indicate summary rows or paging rows that contain retained data. 4. Review the source datasets against the listing and the developer s dataset to identify the variables used for the listing 5. Use proc compare to check if the developer used any formats to display data, as well as double check the length of the variables to confirm that it is long enough to fit the contents of the variables. 6. Use proc compare s ID statement to list all the variables needed to make each row unique i.e. sex, age, and name. 7. Either rename variables to match corresponding columns between developer s dataset and QC programmer s dataset or use proc compare s WITH statement to identify which variable in the QC programmer s dataset to compare to in the VAR statement s list of variables from the developer s dataset. 8. Ensure you are using the correct population and the correct merge statement. 9. Once the number of observations match then use proc compare and use the output as your guide. Find the first observation that is discrepant and pull up the datasets to compare side by side. POTENTIAL BLIND SPOTS WITH PROC COMPARE 1. MISSING VARIABLES: Confirm you are comparing the desired variables. 2. MISSING OBSERVATIONS: Before comparing datasets, confirm the count of both datasets match. 3. CONFLICTING TYPES: Different types for the same variable name may occur because of re-formatting to adjust order (e.g. SEXN and SEX in the example). 4. MISMATCHED ID VARIABLES: The ID statement in proc compare lists variable(s) on which to match each observation by and if the distributions of these variables are off, it can lead to problems. One solution is to use proc freq with the LIST option after the TABLES statement by the ID variables to see if the counts per strata match between datasets. EXAMPLES: SASHELP.CLASS The class dataset contains five variables; two character variables: sex and name, and three numeric variables: age, height, and weight. Table 1 is an example output of the dataset with Sex, Age, and Name as the unique ID for the listing. Sex Name Age Height (cm) Weight (lb) Male Thomas 11 57.5 85 Male James 12 57.3 83 Male John 12 59 99.5 Male Robert 12 64.8 128 Male Jeffrey 13 62.5 84 Male Henry 14 63.5 102.5 Male Alfred 14 69 112.5 Male William 15 66.5 112 Male Ronald 15 67 133 Male Philip 16 72 150 Female Joyce 11 51.3 50.5 Female Louise 12 56.3 77 Female Jane 12 59.8 84.5 Female Alice 13 56.5 84 Female Barbara 13 65.3 98 Female Carol 14 62.8 102.5 Female Judy 14 64.3 90 Female Janet 15 62.5 112.5 Female Mary 15 66.5 112 Table 1. DBLOAD Procedure: Default DB2 Data Types for SAS Variable Formats

Display 1 shows an example of a dataset made by a developer, and Display 2 shows the proc contents of that dataset. Something to note is there are three extra variables in Display 1 than displayed in table 1: SEXN, AGE1, and _BREAK_. Now we need to assess if they are used to support the outputted listing. First, _BREAK_ indicates that this was made by proc report and all rows that are blank in _BREAK_ need to be assured for accuracy (if _BREAK_ is not blank it is likely an indication of a paging row or summary row that contains repeated values of the previous row). AGE1 and AGE seem to be identical, but looking at proc contents (Display 2) we see AGE1 is the character form of AGE, and AGE is displayed because the label matches what is in the listing header (Table 1). In proc contents we also see that SEXN is the numeric version of SEX from display 1, and from Display 2 we see that SEX is formatted and the original form of SEX has a length of 1. This indicates that a format was used for the SEX variable to output Male and Female. Display 1. Former Main Interface for SAS Management Console Display 2. Former Main Interface for SAS Management Console

From the above information, we see that we need to use SEXN (the numeric form of SEX), AGE, HEIGHT, and WEIGHT to order the dataset. Again, AGE1 is not needed for sorting and is not in the output, so we don t need to compare this variable. Also, while NAME is part of the unique ID of the table (sex, age, name) it is not used for sorting. One way to check the distribution of the rows, is to run a proc freq by the strata you are interested in. In this case, we are interested in the distribution of SEX and AGE (We could use SEXN or AGE1 because they match their counterpart). Output 3 shows an example of how to setup proc freq to provide the output in Display 3. proc freq data=l_16_1_dm; table sex*age/list missing; Output 3. PROC FREQ to Check Distribution Between Developer s Dataset and QC s Dataset Display 3. List View of the PROC FREQ From Output 3 At this point, the counts and the distributions of the datasets match, the sorting is done, and we have the variables we want to compare. Now we can run the proc compare of the developer s dataset (l_16_1_dm) verse the QC programmer s dataset (qc1), as seen in Output 4. Again, if the QC programmer creates the variable names to match that of the corresponding variables in the developer s dataset then Output 4 will work. If not, you will have to add VAR and WITH statements to specify which variables match each other. An example of the result of Output 4 is provided in display 4 through 6. proc compare base=l_16_1_dm compare=qc1 list; id sexn age name; Output 4. PROC COMPARE Between Developer s Dataset and QC Programmer s Dataset

Display 4. Page 1 of Result of Output 4 Display 4 shows that both the developer s dataset and QC programmer s dataset have 19 observations. Also, it shows that the developer s dataset has 1 more variable then the QC programmer s. In display 5 we see that AGE1 is the variable that is missing and in display 6 we find that all compared variables are exactly equal. Display 4. Page 2 of Result of Output 4

Display 4. Page 3 of Result of Output 4 CONCLUSION Using proc compare can support the QC process for larger listings as well. All you need to do is follow these steps and be mindful that they only fulfill two parts of the QC process. First, they verify the number of records outputted from the original dataset. Second, they assure the accuracy of the order and contents of the records. However, these steps do not check the formats, column headers, titles, and footnotes which can be done manually by reviewing the first page of the listing. REFERENCES SAS Institute Inc. 2017. Base SAS 9.4 Procedures Guide, Seventh Edition. Cary, NC: SAS Institute Inc. https://support.sas.com/documentation/cdl/en/proc/70377/pdf/default/proc.pdf Horstman, Joshua, Muller, Roger. 2014. Don t Get Blindsided by PROC COMPARE. Proceedings of the 2014 SAS Global Conference. Washington, DC. Paper 1615-2014. http://support.sas.com/resources/papers/proceedings14/1615-2014.pdf Chen, Honghua. 2012. Prove QC Quality Create SAS Dataset from RTF File. Proceedings of the 2012 SAS NESUG Conference. Baltimore, MD. http://www.lexjansen.com/nesug/nesug12/ph/ph03.pdf Casas, Angelina. Proc Compare to Validate Datasets. Proceedings of the 2003 SAS PHARMASUG Conference. Miami, FL. http://www.lexjansen.com/pharmasug/2003/tutorials/tu056.pdf ACKNOWLEDGMENTS We would like to express our thanks to Jay Zhou, Xiaodong Li, and Dewei Li for reviewing this. Also for the support from Michelle Rossi, and Debby Smith as we worked on the beginning drafts. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Robert Bikwemu Enterprise: Pharmapace, Inc.

Address: 10509 Vista Sorrento Parkway, Suite 303 City, State ZIP: San Diego, CA 92121 Work Phone: (858)263-0510 E-mail: Robert.Bikwemu@pharmapace.com Name: Nicole Wallstedt Enterprise: Pharmapace, Inc. Address: 10509 Vista Sorrento Parkway, Suite 303 City, State ZIP: San Diego, CA 92121 Work Phone: (858)263-0510 E-mail: Nicole.Wallstedt@pharmapace.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies.