Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

Similar documents
Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

What Do You Mean My CSV Doesn t Match My SAS Dataset?

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

Give me EVERYTHING! A macro to combine the CONTENTS procedure output and formats. Lynn Mullins, PPD, Cincinnati, Ohio

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data

An Alternate Way to Create the Standard SDTM Domains

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

Taming a Spreadsheet Importation Monster

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

Creating and Executing Stored Compiled DATA Step Programs

Tales from the Help Desk 6: Solutions to Common SAS Tasks

A SAS Macro to Generate Caterpillar Plots. Guochen Song, i3 Statprobe, Cary, NC

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

Correcting for natural time lag bias in non-participants in pre-post intervention evaluation studies

Paper HOW-06. Tricia Aanderud, And Data Inc, Raleigh, NC

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

PharmaSUG Paper PO10

Advanced Visualization using TIBCO Spotfire and SAS

Create Metadata Documentation using ExcelXP

esubmission - Are you really Compliant?

Journey to the center of the earth Deep understanding of SAS language processing mechanism Di Chen, SAS Beijing R&D, Beijing, China

A Tool to Compare Different Data Transfers Jun Wang, FMD K&L, Inc., Nanjing, China

The GEOCODE Procedure and SAS Visual Analytics

Clinical Data Visualization using TIBCO Spotfire and SAS

One Project, Two Teams: The Unblind Leading the Blind

SAS Catalogs. Definition. Catalog Names. Parts of a Catalog Name CHAPTER 32

PharmaSUG China Paper 059

The Power of Combining Data with the PROC SQL

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

Quality Control of Clinical Data Listings with Proc Compare

Application of Modular Programming in Clinical Trial Environment Mirjana Stojanovic, CALGB - Statistical Center, DUMC, Durham, NC

CDISC Variable Mapping and Control Terminology Implementation Made Easy

Simplifying Your %DO Loop with CALL EXECUTE Arthur Li, City of Hope National Medical Center, Duarte, CA

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

Extending the Scope of Custom Transformations

SAS File Management. Improving Performance CHAPTER 37

Data Quality Review for Missing Values and Outliers

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

A Breeze through SAS options to Enter a Zero-filled row Kajal Tahiliani, ICON Clinical Research, Warrington, PA

Performance Considerations

Cover the Basics, Tool for structuring data checking with SAS Ole Zester, Novo Nordisk, Denmark

Submitting SAS Code On The Side

PharmaSUG China Mina Chen, Roche (China) Holding Ltd.

Storing and Reusing Macros

Efficient Processing of Long Lists of Variable Names

Useful Tips When Deploying SAS Code in a Production Environment

Using Cross-Environment Data Access (CEDA)

A SAS Macro to Create Validation Summary of Dataset Report

How to validate clinical data more efficiently with SAS Clinical Standards Toolkit

Basic SAS Hash Programming Techniques Applied in Our Daily Work in Clinical Trials Data Analysis

Amie Bissonett, inventiv Health Clinical, Minneapolis, MN

SAS Viya 3.1 FAQ for Processing UTF-8 Data

A Few Quick and Efficient Ways to Compare Data

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

Automation of SDTM Programming in Oncology Disease Response Domain Yiwen Wang, Yu Cheng, Ju Chen Eli Lilly and Company, China

SAS Clinical Data Integration 2.4

PharmaSUG China. Systematically Reordering Axis Major Tick Values in SAS Graph Brian Shen, PPDI, ShangHai

PharmaSUG Paper CC11

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

Missing Pages Report. David Gray, PPD, Austin, TX Zhuo Chen, PPD, Austin, TX

How to Create Data-Driven Lists

PharmaSUG Paper PO12

PharmaSUG Paper CC02

Submission-Ready Define.xml Files Using SAS Clinical Data Integration Melissa R. Martinez, SAS Institute, Cary, NC USA

An Efficient Tool for Clinical Data Check

Tracking Dataset Dependencies in Clinical Trials Reporting

LST in Comparison Sanket Kale, Parexel International Inc., Durham, NC Sajin Johnny, Parexel International Inc., Durham, NC

PhUSE US Connect 2018 Paper CT06 A Macro Tool to Find and/or Split Variable Text String Greater Than 200 Characters for Regulatory Submission Datasets

Dictionary.coumns is your friend while appending or moving data

Arthur L. Carpenter California Occidental Consultants, Oceanside, California

Clinical Standards Toolkit 1.7

WHAT ARE SASHELP VIEWS?

PharmaSUG Paper TT11

SAS Clinical Data Integration 2.6

Planting Your Rows: Using SAS Formats to Make the Generation of Zero- Filled Rows in Tables Less Thorny

Utilizing the VNAME SAS function in restructuring data files

Interactive Programming Using Task in SAS Studio

A Macro to Create Program Inventory for Analysis Data Reviewer s Guide Xianhua (Allen) Zeng, PAREXEL International, Shanghai, China

PharmaSUG Paper AD06

Data Representation. Variable Precision and Storage Information. Numeric Variables in the Alpha Environment CHAPTER 9

Automate Clinical Trial Data Issue Checking and Tracking

PhUSE US Connect 2019

A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN

CHAPTER 7 Examples of Combining Compute Services and Data Transfer Services

SDTM Attribute Checking Tool Ellen Xiao, Merck & Co., Inc., Rahway, NJ

ABSTRACT INTRODUCTION MACRO. Paper RF

An Efficient Solution to Efficacy ADaM Design and Implementation

Data Set Options. Specify a data set option in parentheses after a SAS data set name. To specify several data set options, separate them with spaces.

DBLOAD Procedure Reference

Geocoding Crashes in Limbo Carol Martell and Daniel Levitt Highway Safety Research Center, Chapel Hill, NC

Countdown of the Top 10 Ways to Merge Data David Franklin, Independent Consultant, Litchfield, NH

SAS Publishing. Configure SAS. Forecast Server 1.4. Stored Processes

Plot Your Custom Regions on SAS Visual Analytics Geo Maps

ABSTRACT INTRODUCTION WORK FLOW AND PROGRAM SETUP

Automating Preliminary Data Cleaning in SAS

Exploring DICTIONARY Tables and SASHELP Views

Creating an ADaM Data Set for Correlation Analyses

Transcription:

ABSTRACT PharmaSUG 2016 - Paper QT03 Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina. Consistency, quality and timelines are the three milestones that every department strives to achieve. As SAS Programmers, we often have to work with data generated by another team or a competitor company. Checking the data for compliance with CDISC or company standards will be a major hurdle in latter cases. In this paper we shall discuss how data compliance can be achieved in a quick and efficient manner. The first step in this process will be checking for redundant data i.e., removing any variables with all missing data or an observation where all the variables are missing. Secondly, we need to make a quick check for the presence of required variables as per the required standards. Finally, we have to create a list of required variables which are absent/not in transferred data. By letting SAS do all the work for us, the result will be faster and efficient. This paper will offer an approach to achieving all these goals by a click of a button. The skill level needed is intermediate to advance. INTRODUCTION Improving quality and efficiency in clinical trial data transformation is a never ending process. In this macro, we check for the consistency of the transferred data. We will be checking for variables which are expected to be present in the transferred data but are not present. In order to achieve this list of variables we need to follow below steps, 1. Remove any unwanted/redundant data from the transferred data. 2. Check the variables from transferred data and source data. 3. Create a listing of variables which are not present in the transferred data. We shall see how each task has been achieved individually first. The entire code with nested macro is present at the end of the paper. MACRO INPUT PARAMETERS INDT: Input dataset INDS: Input dataset used to delete the redundant data. TYPE: Type of variable to specify in the array definition. OUTDT: Output dataset with list of variables. OUTDS: Output dataset after all the modification has been done. SOURCE: Location of source data. NUM_CNT: Number of numeric variables. CHAR_CNT: Number of character variables. TRANSFER: Location of transferred data. Step 1: Remove redundant data To remove all the unnecessary data, we can use the below macro. First we shall create macro variables NUM_CNT and CHAR_CNT which have values of number of numeric and character variables respectively. We define an array for the numeric/character variables in the dataset, and also a flag array for which the dimensions will be same as numeric/character array and initial values will be set as Missing assuming that all numeric/character variables have missing values. This flag array will be updated to Not Missing based on the non-missing values present for the numeric/character variables in the dataset. Once we reach the last observation of the input dataset, we check for the values in the flags array. If there are any variables whose corresponding flag value is still Missing as we first assigned, then we create a macro variable with the list of these variables. Later we use this macro variable and drop all the character/numeric variables which have missing values for all the observations. set &transfer.&inputdt. ; /* Main input dataset */ data _NULL_; set transfer_inds (obs=1); 1

/* Creating MV num_cnt for no. of numeric variables */ call symput ('num_cnt', put(dim(numarray),best.)); /* Creating MV char_cnt for no. of character variables */ call symput ('char_cnt', put(dim(chararray),best.)); %let num_cnt = &num_cnt. ; %let char_cnt = &char_cnt. ; /* Creating macro variables with all character/numeric variable names */ proc sql ; select name into :char separated by ', ' and strip(lowcase(type))= char ; select name into :num separated by ', ' and strip(lowcase(type))='num' ; quit ; /* Deleting the observations where all the variables have missing values */ set transfer_inds ; number_char_missing=cmiss(&char) ; number_num_missing=nmiss(&num) ; if number_char_missing=dim(chararray) and number_num_missing=dim(numarray) then delete ; %macro del_vars (type=,inds=,outds=,flgcnt=) ; data &outds. ; set &inds. end=lastobser; length droplist $1000 ; /* Defining an array for variables to be checked */ array numarray {*} &type. ; /* Defining a flag array. */ array flgarray {*} $20 flgarray1-flgarray&flgcnt. (&flgcnt. * 'Missing'); /* Updating the value of flag */ do i=1 to dim(numarray); if missing(numarray[i])=0 then flgarray[i]='not Missing'; /* Creating the droplist macro variable */ if lastobser then do; do i=1 to dim(flgarray); if flgarray[i]='missing' then droplist=trim(left(droplist)) ' ' trim(vname(numarray[i])); call symput('droplist', droplist); data &outds._1 ; set &outds. ; 2

drop &droplist. flgarray: droplist i ; %mend del_vars ; /* Executing the %del_vars macro to delete numeric variables */ %del_vars (type=_numeric_,inds=transfer_inds,outds=final1,flgcnt=&num_cnt.) ; /* As we already removed the numeric variables in the above macro execution, the output of the above macro will be input to the second execution as we still need to delete character variables with all missing data */ %del_vars (type=_char_,inds=final1_1,outds=final2,flgcnt=&char_cnt.) ; Now, in final2_1.sas7bdat we have data with variables which have non-missing data. Step 2: Check the variables from transferred data and source data. Next step, is to check for variables which are not present in the transferred data. In order to do this we shall make use of CONTENTS procedure and create data which contains the names of the available variables. /*Getting the list of variables from source data and transferred data */ proc contents data = final2_1 out=transfer(keep=name type); proc contents data = &SOURCE..&INDT. out=source(keep=name type); Once we get the meta-data of the required datasets, we MERGE the transferred and source data to identify the variables which are not present or extra in transferred data. Step 3: Create a listing of variables which are not present in the transferred data. /*identifying the Variables which are present in source data but not in transferred data and vice versa. */ data &outdt. ; merge source(in=a) transfer(in=b) ; by name type ; if (a and not b) or (b and not a) ; CONCLUSION Quality check is a major task in any industry and working with transferred data is a very common scenario. This macro provides an extra layer of quality assurance in such cases. This macro can be used to lay ground work and give preliminary review of the data. REFERENCES SAS 9.2 Macro Language: Reference. Cary, NC: SAS Institute Inc. SAS 9.2 Language Reference: Dictionary. Cary, NC: SAS Institute Inc. ACKNOWLEDGMENTS Eliassen Group Biometrics and Data Solutions 1005 Slater Rd #100, Durham, NC 27703 3

CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Divyaja R Padamati Enterprise: Eliassen Group Biometrics and Data Solutions Address: 1005 Slater Rd #100 City, State ZIP: Durham, NC 27703 E-mail: dpadamati@eliassen.com Web: egbiometrics.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 4

/* Purpose of the macro: To identify the non-missing variables which are present in source data but not in transferred data. Step 1. Delete the variables from transferred data where all the values are missing. ie., a variable for which all observations are missing. Step 2. Create 2 new datasets with only the variable names and their types using procedure contents. Step 3. Merge the source data and transferred data by name and type to find any unnecessary or extra data. */ %macro varcheck (inputdt=,outdt=,source=,transfer=); libname transfer "&transfer." access=readonly ; libname source "&source." access=readonly ; /* Clearing the work library */ proc datasets lib=work kill ; quit ; /*Step 1. Deleting redundant data i.e., variables/observations with all missing data. */ set &source..&inputdt. ; data _NULL_; set transfer_inds (obs=1); call symput ('num_cnt', put(dim(numarray),best.)); call symput ('char_cnt', put(dim(chararray),best.)); %let num_cnt = &num_cnt. ; %let char_cnt = &char_cnt. ; /* Creating macro variables with all character/numeric variable names */ proc sql ; select name into :char separated by ', ' and strip(lowcase(type))= char ; select name into :num separated by ', ' and strip(lowcase(type))='num' ; quit ; /* Deleting the observations where all the variables have missing values */ set transfer_inds ; number_char_missing=cmiss(&char) ; number_num_missing=nmiss(&num) ; if number_char_missing=dim(chararray) and number_num_missing=dim(numarray) then delete ; 5

%macro del_vars (type=,inds=,outds=,flgcnt=) ; data &outds. ; set &inds. end=lastobser; length droplist $1000 ; array numarray {*} &type. ; array flgarray {*} $20 flgarray1-flgarray&flgcnt. (&flgcnt. * 'Missing'); do i=1 to dim(numarray); if missing(numarray[i])=0 then flgarray[i]='not Missing'; if lastobser then do; do i=1 to dim(flgarray); if flgarray[i]='missing' then droplist=trim(left(droplist)) ' ' trim(vname(numarray[i])); call symput('droplist', droplist); data &outds._1 ; set &outds. ; drop &droplist. flgarray: droplist i ; %mend del_vars ; %del_vars (type=_numeric_,inds=transfer_inds,outds=final1,flgcnt=&num_cnt.) ; %del_vars (type=_char_,inds=final1_1,outds=final2,flgcnt=&char_cnt.) ; /*Step 2. Create 2 new datasets with only the variable names and their types using procedure contents.*/ proc contents data=final2_1 out=transfer(keep=name type); proc contents data=&source..&indt. out=source(keep=name type) ; /*Step 3. Identifying the variables which are present in source data but not in transferred data and vice versa. data &outdt. ; merge source(in=a) transfer(in=b) ; by name type ; if (a and not b) or (b and not a) ; %mend varcheck ; SAMPLE CALL: %VARCHECK (indt=ae, outdt=ae_varchk, source=%str(/stats/study/data), transfer=%str(/stats/study/transfer/data) ) ; 6