Automating Comparison of Multiple Datasets Sandeep Kottam, Remx IT, King of Prussia, PA

Similar documents
Data Quality Review for Missing Values and Outliers

ODS TAGSETS - a Powerful Reporting Method

Correcting for natural time lag bias in non-participants in pre-post intervention evaluation studies

Document and Enhance Your SAS Code, Data Sets, and Catalogs with SAS Functions, Macros, and SAS Metadata. Louise S. Hadden. Abt Associates Inc.

Using a Fillable PDF together with SAS for Questionnaire Data Donald Evans, US Department of the Treasury

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

David S. Septoff Fidia Pharmaceutical Corporation

ABSTRACT MORE THAN SYNTAX ORGANIZE YOUR WORK THE SAS ENTERPRISE GUIDE PROJECT. Paper 50-30

A Few Quick and Efficient Ways to Compare Data

Tracking Dataset Dependencies in Clinical Trials Reporting

Quality Control of Clinical Data Listings with Proc Compare

Pros and Cons of Interactive SAS Mode vs. Batch Mode Irina Walsh, ClinOps, LLC, San Francisco, CA

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

A Visual Step-by-step Approach to Converting an RTF File to an Excel File

KEYWORDS Metadata, macro language, CALL EXECUTE, %NRSTR, %TSLIT

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

Amie Bissonett, inventiv Health Clinical, Minneapolis, MN

Prove QC Quality Create SAS Datasets from RTF Files Honghua Chen, OCKHAM, Cary, NC

footnote1 height=8pt j=l "(Rev. &sysdate)" j=c "{\b\ Page}{\field{\*\fldinst {\b\i PAGE}}}";

A Macro that can Search and Replace String in your SAS Programs

Making a List, Checking it Twice (Part 1): Techniques for Specifying and Validating Analysis Datasets

PharmaSUG China Paper 059

A Macro to Create Program Inventory for Analysis Data Reviewer s Guide Xianhua (Allen) Zeng, PAREXEL International, Shanghai, China

Cover the Basics, Tool for structuring data checking with SAS Ole Zester, Novo Nordisk, Denmark

CHAPTER 7 Examples of Combining Compute Services and Data Transfer Services

SAS ENTERPRISE GUIDE USER INTERFACE

Developing Data-Driven SAS Programs Using Proc Contents

The Impossible An Organized Statistical Programmer Brian Spruell and Kevin Mcgowan, SRA Inc., Durham, NC

Checking for Duplicates Wendi L. Wright

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

Reading in Data Directly from Microsoft Word Questionnaire Forms

ABC Macro and Performance Chart with Benchmarks Annotation

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

Better Metadata Through SAS II: %SYSFUNC, PROC DATASETS, and Dictionary Tables

Utilizing the VNAME SAS function in restructuring data files

Give me EVERYTHING! A macro to combine the CONTENTS procedure output and formats. Lynn Mullins, PPD, Cincinnati, Ohio

Real Time Clinical Trial Oversight with SAS

PharmaSUG Paper TT10 Creating a Customized Graph for Adverse Event Incidence and Duration Sanjiv Ramalingam, Octagon Research Solutions Inc.

MedDRA Dictionary: Reporting Version Updates Using SAS and Excel

Let SAS Help You Easily Find and Access Your Folders and Files

LST in Comparison Sanket Kale, Parexel International Inc., Durham, NC Sajin Johnny, Parexel International Inc., Durham, NC

WHAT ARE SASHELP VIEWS?

Don t Forget About SMALL Data

What's the Difference? Using the PROC COMPARE to find out.

SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD

An Efficient Tool for Clinical Data Check

KEPT IN TRANSLATION: AVOIDING DATA LOSS AND OTHER PROBLEMS WHEN CONVERTING JAPANESE DATA

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

Going Under the Hood: How Does the Macro Processor Really Work?

SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada

Functions vs. Macros: A Comparison and Summary

Exporting Variable Labels as Column Headers in Excel using SAS Chaitanya Chowdagam, MaxisIT Inc., Metuchen, NJ

Data Manipulation with SQL Mara Werner, HHS/OIG, Chicago, IL

Taming a Spreadsheet Importation Monster

T.I.P.S. (Techniques and Information for Programming in SAS )

Virtual Accessing of a SAS Data Set Using OPEN, FETCH, and CLOSE Functions with %SYSFUNC and %DO Loops

How to Create Data-Driven Lists

A Mass Symphony: Directing the Program Logs, Lists, and Outputs

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

Syntax Conventions for SAS Programming Languages

PH006 Audit Trails of SAS Data Set Changes An Overview Maria Y. Reiss, Wyeth Pharmaceuticals, Collegeville, PA

Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC

Interactive Programming Using Task in SAS Studio

The Output Bundle: A Solution for a Fully Documented Program Run

Useful Tips When Deploying SAS Code in a Production Environment

PharmaSUG Paper CC02

The 'SKIP' Statement

Simplifying the Sample Design Process with PROC PMENU

Missing Pages Report. David Gray, PPD, Austin, TX Zhuo Chen, PPD, Austin, TX

Using the CLP Procedure to solve the agent-district assignment problem

Summary Table for Displaying Results of a Logistic Regression Analysis

Demystifying Inherited Programs

Moving Data and Results Between SAS and Excel. Harry Droogendyk Stratia Consulting Inc.

PREREQUISITES FOR EXAMPLES

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System

Different Methods for Accessing Non-SAS Data to Build and Incrementally Update That Data Warehouse

While You Were Sleeping, SAS Was Hard At Work Andrea Wainwright-Zimmerman, Capital One Financial, Inc., Richmond, VA

Essential ODS Techniques for Creating Reports in PDF Patrick Thornton, SRI International, Menlo Park, CA

Validation Summary using SYSINFO

A Time Saver for All: A SAS Toolbox Philip Jou, Baylor University, Waco, TX

Write SAS Code to Generate Another SAS Program A Dynamic Way to Get Your Data into SAS

Run your reports through that last loop to standardize the presentation attributes

The Ins and Outs of %IF

Using DDE with Microsoft Excel and SAS to Collect Data from Hundreds of Users

Leveraging SAS Visualization Technologies to Increase the Global Competency of the U.S. Workforce

Paper PS05_05 Using SAS to Process Repeated Measures Data Terry Fain, RAND Corporation Cyndie Gareleck, RAND Corporation

Make the Most Out of Your Data Set Specification Thea Arianna Valerio, PPD, Manila, Philippines

Extending the Scope of Custom Transformations

SAS Macro Technique for Embedding and Using Metadata in Web Pages. DataCeutics, Inc., Pottstown, PA

Please Don't Lag Behind LAG!

Program Validation: Logging the Log

Presentation Goals. Now that You Have Version 8, What Do You Do? Top 8 List: Reason #8 Generation Data Sets. Top 8 List

SESUG Paper RIV An Obvious Yet Helpful Guide to Developing Recurring Reports in SAS. Rachel Straney, University of Central Florida

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG

EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT)

SAS Data Integration Studio Take Control with Conditional & Looping Transformations

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

ST Lab 1 - The basics of SAS

Transcription:

Automating Comparison of Multiple Datasets Sandeep Kottam, Remx IT, King of Prussia, PA ABSTRACT: Have you ever been asked to compare new datasets to old datasets while transfers of data occur several times during the project life time, yet the dataset names, to compare are different and each dataset has different key variables? SAS macro facility along with metadata files with rules on how to compare can be used for orchestration of this process. The Rules metadata file contains information about filenames (old & new), key variables to be used to compare each dataset and variables not to compare for each dataset. Focus on the metadata file will result in the tool providing meaningful output to review. This paper will discuss the process and identify how it provides confidence in the output. INTRODUCTION: Data comparison is an important aspect in any project. This paper will discuss a macro to produce a single report after comparing datasets regardless of datasets being compared. MACRO OVERVIEW: The COMPARE macro has the following steps. 1.) The macro imports the Excel sheet. a.) The Excel consists of following details as shown in fig 1. I. The by variables to be used to compare the datasets. II. Drop variables that we don t want to compare in each dataset. III. If the dataset is to be compared or not. 2.) It creates macro variables of the dataset names, by variables, drop variables using call symput function. 3.) The do loop is used to process all the datasets at a time. Fig1: Excel sheet with specifications. 1

DATA: The libname statements are used to define the libraries. **First Data Tranfer libname trans1 "C:\nesug\datatransfer1" **Second Data Transfer libname trans2 "C:\nesug\datatransfer2" **Salary data of each empid data trans1.salary input empid position :$15. salary format empid z3. salary dollar10. datalines 001 Sr.Manager 100000 002 Manager 90000 003 Teamlead 80000 004 Sr.programmer 75000 005 programmer 60000 data trans2.salary1 input empid position :$15. salary format empid z3. salary dollar10. datalines 001 Sr.Manager 100000 002 Manager 90000 003 Teamlead 80000 004 Sr.programmer 75000 005 programmer 65000 **Bonus data of each empid data trans1.bonus input empid position :$15. salary percent format empid z3. salary dollar10. bonus=(percent/100)*salary datalines 001 Sr.Manager 100000 20 002 Manager 90000 15 003 Teamlead 80000 10 004 Sr.programmer 75000 4 005 programmer 60000 3 data trans2.bonus1 input empid position :$15. salary percent format empid z3. salary dollar10. bonus=(percent/100)*salary datalines 001 Sr.Manager 100000 20 002 Manager 90000 15 003 Teamlead 80000 10 004 Sr.programmer 75000 4 005 programmer 65000 3 **Shares data of each empid data trans1.shares input empid position :$15. salary percent shares format empid z3. salary dollar10. bonus=(percent/100)*salary datalines 001 Sr.Manager 100000 20 10000 002 Manager 90000 15 5000 2

003 Teamlead 80000 10 3000 004 Sr.programmer 75000 4 1000 005 programmer 60000 3 500 data trans2.shares1 input empid position :$15. salary percent shares format empid z3. salary dollar10. bonus=(percent/100)*salary datalines 001 Sr.Manager 100000 20 10000 002 Manager 90000 15 5000 003 Teamlead 80000 10 3000 004 Sr.programmer 75000 4 1000 005 programmer 65000 3 500 MACRO: %macro compare(lib1,lib2,xls) libname lib1 "&&lib1." libname lib2 "&&lib2." **Import Specifications for each dataset proc import datafile = "&xls" out = qc1 dbms = excel replace getnames = yes sheet="compare" **Create macro variables data qc set qc1(where=(dataset1^=' ' and qc='y')) retain i i + 1 call symput("dat" trim(left(put(i,3.))),trim(dataset1)) call symput("dat2" trim(left(put(i,3.))),trim(dataset2)) call symput("byvar" trim(left(put(i,3.))),trim(byvar)) call symput("qc" trim(left(put(i,3.))),trim(qc)) call symput("dropvar" trim(left(put(i,3.))),trim(dropvar)) call symput("tot",left(put(i,3.))) ***Print the total number of datasets being compared to log %put Total number of datasets=&tot **Output the results to Rtf file options orientation=landscape ods listing close ods noresults ods rtf file="c:\nesug\compare.rtf" **Run all the datasets at once %do i=1 %to &tot %if %upcase(&&dropvar&i.)^=none %then %do **Sort using the by varaibles before comparing and output to another dataset 3

proc sort data=lib1.&&dat&i. out=base&i.(drop=&&dropvar&i.) by &&byvar&i. proc sort data=lib2.&&dat2&i. out=compare&i.(drop=&&dropvar&i.) by &&byvar&i. %end %else %do **Sort using the by varaibles before comparing and output to another dataset proc sort data=lib1.&&dat&i. out=base&i. by &&byvar&i. %end proc sort data=lib2.&&dat2&i. out=compare&i. by &&byvar&i. ***Comparing the datasets title1 " Comparison is done for data in &&lib1. with &&lib2. " title2 " Base: &&dat&i Compare: &&dat2&i. " proc compare data=base&i. compare=compare&i. id &&byvar&i. %end ods rtf close ods results ods listing %MEND **Run Macro %compare(%str(c:\nesug\datatransfer1),%str(c:\nesug\datatransfer2), %str(c:\nesug\compare.xls)) OUTPUT: 4

CONCLUSIONS: As mentioned in above method, this compare macro is designed to make your life easier. You can use this code to use it over and over again. We can also use this to change to merge statement than using proc compare. REFERENCES: Art Carpenter s Carpenter's Complete Guide to the SAS Macro Language, 2 nd Edition. Marie Byrd Alexander, Sharon Avrunin-Becker s The Mighty Macro That Will Change How You Macro, NESUG- 2008. ACKNOWLEDGMENTS: SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. Many thanks to Daphne Ewing at Auxilium Pharmaceuticals for her support and encouragement to complete this paper. CONTACT INFORMATION: Your comments and questions are valued and encouraged. Contact the author at: Sandeep Kottam Remx IT 201 Heritage ln Exton, PA-19341 Phone: 605-691-3274 kottamsandeep@gmail.com 5