Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

Similar documents
Create Metadata Documentation using ExcelXP

Exporting Variable Labels as Column Headers in Excel using SAS Chaitanya Chowdagam, MaxisIT Inc., Metuchen, NJ

Better Metadata Through SAS II: %SYSFUNC, PROC DATASETS, and Dictionary Tables

An Efficient Tool for Clinical Data Check

Advanced PROC REPORT: Getting Your Tables Connected Using Links

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

How to Keep Multiple Formats in One Variable after Transpose Mindy Wang

A Better Perspective of SASHELP Views

Validation Summary using SYSINFO

WHAT ARE SASHELP VIEWS?

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

PharmaSUG Paper TT11

How to Create Data-Driven Lists

Uncommon Techniques for Common Variables

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

Square Peg, Square Hole Getting Tables to Fit on Slides in the ODS Destination for PowerPoint

Arthur L. Carpenter California Occidental Consultants, Oceanside, California

Real Time Clinical Trial Oversight with SAS

PharmaSUG China 2018 Paper AD-62

Dictionary.coumns is your friend while appending or moving data

Developing Data-Driven SAS Programs Using Proc Contents

Tales from the Help Desk 6: Solutions to Common SAS Tasks

The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data

SESUG Paper RIV An Obvious Yet Helpful Guide to Developing Recurring Reports in SAS. Rachel Straney, University of Central Florida

Procedure for Stamping Source File Information on SAS Output Elizabeth Molloy & Breda O'Connor, ICON Clinical Research

Taming a Spreadsheet Importation Monster

ODS DOCUMENT, a practical example. Ruurd Bennink, OCS Consulting B.V., s-hertogenbosch, the Netherlands

Amie Bissonett, inventiv Health Clinical, Minneapolis, MN

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

Cover the Basics, Tool for structuring data checking with SAS Ole Zester, Novo Nordisk, Denmark

PharmaSUG Paper PO12

footnote1 height=8pt j=l "(Rev. &sysdate)" j=c "{\b\ Page}{\field{\*\fldinst {\b\i PAGE}}}";

One Project, Two Teams: The Unblind Leading the Blind

PROBLEM FORMULATION, PROPOSED METHOD AND DETAILED DESCRIPTION

TLFs: Replaying Rather than Appending William Coar, Axio Research, Seattle, WA

Using SAS Macros to Extract P-values from PROC FREQ

Data Quality Review for Missing Values and Outliers

Making a SYLK file from SAS data. Another way to Excel using SAS

Exploring DICTIONARY Tables and SASHELP Views

ODS TAGSETS - a Powerful Reporting Method

Migration to SAS Grid: Steps, Successes, and Obstacles for Performance Qualification Script Testing

Planning to Pool SDTM by Creating and Maintaining a Sponsor-Specific Controlled Terminology Database

A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN

Efficient Processing of Long Lists of Variable Names

SQL Metadata Applications: I Hate Typing

Fall 2012 OASUS Questions and Answers

I KNOW HOW TO PROGRAM IN SAS HOW DO I NAVIGATE SAS ENTERPRISE GUIDE?

Reporting from Base SAS Tips & Tricks. Fareeza Khurshed BC Cancer Agency

ABSTRACT INTRODUCTION MACRO. Paper RF

What Do You Mean My CSV Doesn t Match My SAS Dataset?

Combining TLFs into a Single File Deliverable William Coar, Axio Research, Seattle, WA

A SAS Macro to Generate Caterpillar Plots. Guochen Song, i3 Statprobe, Cary, NC

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

Posters. Workarounds for SASWare Ballot Items Jack Hamilton, First Health, West Sacramento, California USA. Paper

A Macro to Create Program Inventory for Analysis Data Reviewer s Guide Xianhua (Allen) Zeng, PAREXEL International, Shanghai, China

Quality Control of Clinical Data Listings with Proc Compare

PharmaSUG Paper SP04

ABSTRACT MORE THAN SYNTAX ORGANIZE YOUR WORK THE SAS ENTERPRISE GUIDE PROJECT. Paper 50-30

Keeping Track of Database Changes During Database Lock

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

SAS ENTERPRISE GUIDE USER INTERFACE

Scrambling of Un-Blinded Data without Scrambling Data Integrity! Jaya Baviskar, Pharmanet/i3, Mumbai, India

Summary Table for Displaying Results of a Logistic Regression Analysis

David Ghan SAS Education

3N Validation to Validate PROC COMPARE Output

ABSTRACT INTRODUCTION WORK FLOW AND PROGRAM SETUP

Application of Modular Programming in Clinical Trial Environment Mirjana Stojanovic, CALGB - Statistical Center, DUMC, Durham, NC

Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA

A Tool to Compare Different Data Transfers Jun Wang, FMD K&L, Inc., Nanjing, China

Paper CC01 Sort Your SAS Graphs and Create a Bookmarked PDF Document Using ODS PDF ABSTRACT INTRODUCTION

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System

SAS Visual Analytics Environment Stood Up? Check! Data Automatically Loaded and Refreshed? Not Quite

Contents of SAS Programming Techniques

Essential ODS Techniques for Creating Reports in PDF Patrick Thornton, SRI International, Menlo Park, CA

PharmaSUG China Big Insights in Small Data with RStudio Shiny Mina Chen, Roche Product Development in Asia Pacific, Shanghai, China

PharmaSUG China Paper 059

A Macro to Keep Titles and Footnotes in One Place

The Power of Combining Data with the PROC SQL

Planting Your Rows: Using SAS Formats to Make the Generation of Zero- Filled Rows in Tables Less Thorny

Automate Clinical Trial Data Issue Checking and Tracking

Multiple Graphical and Tabular Reports on One Page, Multiple Ways to Do It Niraj J Pandya, CT, USA

The Devil is in the Details Reporting from Pinnacle 21 (OpenCDISC) Validation Report

PharmaSUG Paper CC02

%check_codelist: A SAS macro to check SDTM domains against controlled terminology

PharmaSUG 2014 PO16. Category CDASH SDTM ADaM. Submission in standardized tabular form. Structure Flexible Rigid Flexible * No Yes Yes

SAS Application to Automate a Comprehensive Review of DEFINE and All of its Components

PharmaSUG Paper AD21

PharmaSUG Paper AD06

PharmaSUG China Mina Chen, Roche (China) Holding Ltd.

Get SAS sy with PROC SQL Amie Bissonett, Pharmanet/i3, Minneapolis, MN

A Useful Macro for Converting SAS Data sets into SAS Transport Files in Electronic Submissions

Purchase this book at

Tips and Tricks for Creating Multi-Sheet Microsoft Excel Workbooks the Easy Way with SAS. Vincent DelGobbo, SAS Institute Inc.

QUERIES BY ODS BEGINNERS. Varsha C. Shah, Dept. of Biostatistics, UNC-CH, Chapel Hill, NC

Paper S Data Presentation 101: An Analyst s Perspective

Give m Their Way They ll Love it! Sarita Prasad Bedge, Family Care Inc., Portland, Oregon

The Next Generation Smart Program Repository

Paper B GENERATING A DATASET COMPRISED OF CUSTOM FORMAT DETAILS

Transcription:

ABSTRACT PharmaSUG2012 Paper CC14 Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee Prior to undertaking analysis of clinical trial data, in addition to review of the Protocol, Case Report Forms, and Statistical Analysis Plan, a basic understanding of the SAS data available is crucial for statistical programmers and biostatisticians. Such an understanding can be accessed readily using the DATASETS and CONTENTS procedures. To centralize in one place and increase efficiency by which the information may be reviewed, a quick method using the SAS DICTIONARY library with the SQL, REPORT and PRINT procedures is presented. Through dynamic, datadriven, generation of macro control variables, a hyper linked data definitions document, in multiple formats, may be produced quickly, and with minimal user input. INTRODUCTION AND BACKGROUND Successful programming and analysis of a clinical trial requires a good understanding of both the protocol and statistical analysis plan, and most importantly, the data collected and available for the study. With regards to the SAS datasets, the size, number, and structure of datasets available are all crucial bits of information needed to get started with programming an analysis. Where information is coming from and resides is also important. Specifications can change within a study, or vary greatly between studies and/or therapeutic areas. Therefore, in the absence of already established documentation, a tool to gather all this information together into one or two files quickly at the initial stages of project work is very useful for a programmer or statistician to familiarize them with the SAS datasets. Informally, the DATASETS and CONTENTS procedures in SAS are great for quickly presenting this information throughout the programming process. Alternatively, much of this information also resides in the DICTIONARY library, accessible using the SQL procedure. Furthermore, the flexibility that SQL provides for dynamically creating and storing macro variables offers a powerful method to access and present the information, with little or no user input. Thus, after a few interim steps, the REPORT procedure is used to present the information such that individual users can navigate as needed using hyperlinks. A subsequent optional feature uses the PRINT procedure to present a subset of the data itself. METHODS AND RESULTS The process by which the quick data definitions are built was intentionally designed as simply as possible, to maximize ease of use and minimize the time necessary to produce a reviewable document, and may be summarized as follows: 1. User provides location of SAS datasets needed for review. 2. SQL procedure to create temporary datasets from Tables and Columns for use by REPORT and PRINT procedures. 3. SQL procedure to dynamically generate macro variables. 4. Create data definitions within one hyperlinked PDF file. 5. Optionally create a hyperlinked spreadsheet with example data. The code necessary to perform these 5 steps may be found in the Appendix. The next few paragraphs will go into further detail regarding specific aspects of Steps 2-5. Once the location of relevant data is provided by the user, temporary datasets are created to contain the information presented in the data definitions. For this simplified example, two datasets are created from the Table and Column dictionaries: create table TOC as select libname, memname,nobs,nvar,crdate from dictionary.tables where upcase(libname) in ("&libname.") and nobs gt 0 ; create table ALLVARS as select memname,name, type, length, varnum, label from dictionary.columns where upcase(libname) in ("&libname.") and memname in (select memname from dictionary.tables where upcase(libname) in ("&libname.") and nobs gt 0); 1

The temporary dataset TOC created from the Table dictionary will be used later in the process as a table of contents for the data definitions. Given its purpose, the dataset is limited to only a subset of the variables included in the Table dictionary. Another temporary dataset ALLVARS is created as a subset of the Columns dictionary, which will act as a repository for dataset level information of interest in the data definitions. Again, only a subset of the variables available is kept for use in the document. For the next step in the process, still within the SQL procedure, several macro variables are dynamically created to iteratively present information within the definitions. Then main information needed is the number of datasets present in the library location, and the names of those datasets. Both types of information may be quickly found and stored as macro variables using the SQL procedure and the temporary datasets already created, as follows: select n(memname) into :nset from work.toc; select memname into :domain1-:domain%left(&nset) from work.toc; In this example, only the datasets with more than zero observations are considered, by adding the appropriate qualifier to the WHERE clause within the SELECT statement. Additional qualifiers to filter, or narrow, the scope of data considered may be quickly added as needed. The next step of the process utilizes the macro variables and temporary datasets of TOC and ALLVARS, all created in one SQL procedure, in combination with the REPORT procedure, to quickly and efficiently produce a one-stop document of data definitions. In this example, a PDF format is chosen for illustration, but other formats could be appropriate. The first portion of the document is a table of contents produced from the temporary dataset TOC. Since the MEMNAME variable, or dataset name, is common to both the TOC and ALLVARS datasets, it will be used to automatically provide navigable linkage to subsequent dataset level portions of the document, and to subsequently provide linkage to return to the table of contents. This is accomplished using the ANCHOR option in the ODS PDS statement and the COMPUTE and CALL DEFINE statements with the REPORT procedure as follows: ods pdf anchor="page1"; rtag="#" strip(lowcase(memname)); Following the first REPORT procedure, the macro variables created previously in SQL are used to individually present the datasets within the library, with a target link from the table of contents, and a link to return to the table of contents at the bottom of each page. Again, this is accomplished using the ANCHOR option in the ODS PDF statement, coupled with a URL tag in the FOOTNOTE statement of the REPORT procedure, as such: ods pdf anchor="%lowcase(%cmpres(&&domain&i))" startpage=now; ods proclabel="&&domain&i"; title1 "contents of &&domain&i"; proc report data=work.allvars (where=(memname = "&&domain&i" )) contents="&&domain&i" nowindows; column varnum name label type length; define <...> footnote1 "^S={URL='#page1'}Table of Contents"; Therefore, with one SQL procedure to create macro variables and a small number of temporary datasets, one REPORT procedure to create a broad table of contents for a location of interest, and one REPORT procedure within an iterative loop dependant upon the number of datasets of interest, a quick overview of the dataset structure is obtained, with navigability embedded automatically. The collection of procedures runs easily with very little input, specifically location of data and location desired for the document, necessary from the user becoming familiar with project data. In addition to quickly being able to see the datasets and variables available for a given project, a closer look at the contents may also be necessary. Therefore, an option to present the data itself was also included in the data definitions macro, which allows the user to create a linked document, in this example a spreadsheet with generic 2

auto-filters in place, where each worksheet within the file represents either the table of contents consistent to the definitions document, all the variables and associated dataset in which they are located, or the dataset itself with a subset of the data presented to limit the size of the file. Using the same macro variables and temporary datasets created before, a spreadsheet file is created using the ODS EXCELXP tagset, linkage between the table of contents and individual worksheets facilitated through COMPUTE and CALL DEFINE in the REPORT procedure, and the SHEET_NAME option of the tagset associated with each PRINT procedure, illustrated briefly here: ods tagsets.excelxp options(sheet_name="&&domain&i"); proc print noobs data=&libname..&&domain&i (obs=10); Using the same general process utilized for creating the data definitions in PDF format the user now has a navigable spreadsheet with a subset of the data itself. When used in conjunction with a study protocol and analysis plan, the statistician or programmer will quickly be able to familiarize themselves with the data available. CONCLUSIONS The technique presented here offers statisticians and programmers a quick and easy to navigate tool for familiarizing oneself with data available when beginning a project. Using the technique or macro presented in the Appendix requires very little user input and can be rapidly deployed. An adjustment to the code for selection of information from the Table and Column dictionaries requires only minor adjustments to the SQL selection statements. Adding navigability through embedded hyperlinks allows reviewers to get a quick overview of the data available with which to work with. REFERENCES AND RECOMMENDED READING Carpenter, Arthur L, 2007, Carpenter s Complete Guide to the SAS REPORT Procedure, SAS Institute Inc., Cary NC. SAS 9.2 Online Documentation ACKNOWLEDGMENTS I would like to thank my colleagues in the Biostatistics and Statistical Programming groups at PharmaNet/i3 who provided comments while drafting the process, especially Fernando Enriquez and Karl Miller. Their insights and guidance are appreciated. CONTACT INFORMATION Comments and questions are valued and encouraged. Please contact author at: Name: Bradford J. Danner Enterprise: PharmaNet/i3 Address: 1787 Sentry Parkway West, Suite 300, Building 16 City, State ZIP: Blue Bell, PA 19422 USA Work Phone: (615) 302-3608 E-mail: BDanner@pharmanet-i3.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 3

APPENDIX /* user needs to assign a path to SAS datasets */ libname <user supplied> <user supplied> ; %macro definitions(libname=,destination=,example=); proc sql noprint; create table TOC as select libname, memname,nobs,nvar,crdate from dictionary.tables where upcase(libname) in ("&libname.") and nobs gt 0 ; create table ALLVARS as select memname,name, type, length, varnum, label from dictionary.columns where upcase(libname) in ("&libname.") and memname in (select memname from dictionary.tables where upcase(libname) in ("&libname.") and nobs gt 0); select n(memname) into :nset from work.toc; select memname into :domain1-:domain%left(&nset) from work.toc; quit; ods listing close; ods escapechar="^"; ods pdf anchor="page1"; ods pdf style=printer file="&destination.quick_dde.pdf" pdftoc=1; ods proclabel='list of datasets'; title1 "list of datasets"; proc report data=toc nowindows contents="dataset definitions"; column memname nobs nvar crdate; define memname/display style={cellwidth=3 cm just=l}; define nobs/display style={cellwidth=3 cm just=c}; define nvar/display style={cellwidth=3 cm just=c}; define crdate/display style={cellwidth=4 cm just=c}; rtag="#" strip(lowcase(memname)); ods pdf anchor="%lowcase(%cmpres(&&domain&i))" startpage=now; ods proclabel="&&domain&i"; title1 "contents of &&domain&i"; proc report data=work.allvars (where=(memname = "&&domain&i" )) contents="&&domain&i" nowindows; column varnum name label type length; define varnum/order noprint order=internal; define name/display style={cellwidth=4 cm just=l}; define label/display style={cellwidth=8 cm just=l}; define type/display style={cellwidth=4 cm just=c}; define length/display style={cellwidth=4 cm just=c}; footnote1 "^S={URL='#page1'}Table of Contents"; ods pdf close; ods listing; %if &example=yes %then %do; ods listing close; 4

ods tagsets.excelxp file="&destination.quick_data_view.xls" style=normal options(zoom='75' frozen_headers='yes' autofilter="all" sheet_interval='proc' absolute_column_width="10" sheet_name="datasets"); proc report data=toc nowindows contents="dataset definitions"; column memname nobs nvar crdate; define memname/display define nobs/display ; define nvar/display; define crdate/display ; style={color=blue}; rtag="#" strip(upcase(memname)) "!a1"; ods tagsets.excelxp style=normal options(sheet_name="variables"); proc print label data=allvars noobs; var memname varnum name type length label; ods tagsets.excelxp style=normal options(sheet_name="&&domain&i"); proc print noobs data=&libname..&&domain&i (obs=10); ods tagsets.excelxp close; ods listing; %mend definitions; 5