A Quick and Easy Data Dictionary Macro Pete Lund, Looking Glass Analytics, Olympia, WA

Similar documents
Make Your Life a Little Easier: A Collection of SAS Macro Utilities. Pete Lund, Northwest Crime and Social Research, Olympia, WA

Validation Summary using SYSINFO

SAS 101. Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23. By Tasha Chapman, Oregon Health Authority

Uncommon Techniques for Common Variables

How to Create Data-Driven Lists

Advanced Tutorials. Paper More than Just Value: A Look Into the Depths of PROC FORMAT

Developing Data-Driven SAS Programs Using Proc Contents

Planting Your Rows: Using SAS Formats to Make the Generation of Zero- Filled Rows in Tables Less Thorny

Creating Case Report Tabulations (CRTs) for an NDA Electronic Submission

Learn to Please: Creating SAS Programs for Others

Utilizing SAS for Cross- Report Verification in a Clinical Trials Setting

Procedure for Stamping Source File Information on SAS Output Elizabeth Molloy & Breda O'Connor, ICON Clinical Research

The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data

Paper B GENERATING A DATASET COMPRISED OF CUSTOM FORMAT DETAILS

e. That's accomplished with:

Merge Processing and Alternate Table Lookup Techniques Prepared by

Using Maps with the JSON LIBNAME Engine in SAS Andrew Gannon, The Financial Risk Group, Cary NC

A Cross-reference for SAS Data Libraries

OASUS Spring 2014 Questions and Answers

Paper HOW-06. Tricia Aanderud, And Data Inc, Raleigh, NC

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

Automating the Creation of Data Definition Tables (Define.pdf) Using SAS Version 8.2

Storing and Reusing Macros

Fly over, drill down, and explore

SAS Catalogs. Definition. Catalog Names. Parts of a Catalog Name CHAPTER 32

A Better Perspective of SASHELP Views

Better Metadata Through SAS II: %SYSFUNC, PROC DATASETS, and Dictionary Tables

A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN

10 The First Steps 4 Chapter 2

Remembering the Past. Who Needs Documentation?

SAS Data Libraries. Definition CHAPTER 26

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

My Reporting Requires a Full Staff Help!

STEP 1 - /*******************************/ /* Manipulate the data files */ /*******************************/ <<SAS DATA statements>>

ABSTRACT INTRODUCTION MACRO. Paper RF

EXAMPLES OF DATA LISTINGS AND CLINICAL SUMMARY TABLES USING PROC REPORT'S BATCH LANGUAGE

Data Quality Review for Missing Values and Outliers

Dictionary.coumns is your friend while appending or moving data

SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD

Abstract. Introduction. How Are All of These Tables Related? - Relational Database Map - RDB_MAP.SAS

ABSTRACT. Paper CC-031

Macros to Manage your Macros? Garrett Weaver, University of Southern California, Los Angeles, CA

Introduction. Getting Started with the Macro Facility CHAPTER 1

Keeping Track of Database Changes During Database Lock

Give me EVERYTHING! A macro to combine the CONTENTS procedure output and formats. Lynn Mullins, PPD, Cincinnati, Ohio

SAS Graphs in Small Multiples Andrea Wainwright-Zimmerman, Capital One, Richmond, VA

An Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step Mike Zdeb, FSL, University at Albany School of Public Health, Rensselaer, NY

Version 6 and Version 7: A Peaceful Co-Existence Steve Beatrous and James Holman, SAS Institute Inc., Cary, NC

Know Thy Data : Techniques for Data Exploration

The REPORT Procedure CHAPTER 32

Basic Concept Review

A Practical and Efficient Approach in Generating AE (Adverse Events) Tables within a Clinical Study Environment

PhUSE US Connect 2018 Paper CT06 A Macro Tool to Find and/or Split Variable Text String Greater Than 200 Characters for Regulatory Submission Datasets

Compute Blocks in Report

TLFs: Replaying Rather than Appending William Coar, Axio Research, Seattle, WA

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS

ABSTRACT MORE THAN SYNTAX ORGANIZE YOUR WORK THE SAS ENTERPRISE GUIDE PROJECT. Paper 50-30

Top-Down Programming with SAS Macros Edward Heaton, Westat, Rockville, MD

Tips & Tricks. With lots of help from other SUG and SUGI presenters. SAS HUG Meeting, November 18, 2010

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

Advanced PROC REPORT: Getting Your Tables Connected Using Links

PharmaSUG Paper PO12

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

Preserving your SAS Environment in a Non-Persistent World. A Detailed Guide to PROC PRESENV. Steven Gross, Wells Fargo, Irving, TX

Paper An Automated Reporting Macro to Create Cell Index An Enhanced Revisit. Shi-Tao Yeh, GlaxoSmithKline, King of Prussia, PA

Same Data Different Attributes: Cloning Issues with Data Sets Brian Varney, Experis Business Analytics, Portage, MI

PROC REPORT Basics: Getting Started with the Primary Statements

External Files. Definition CHAPTER 38

Presentation Goals. Now that You Have Version 8, What Do You Do? Top 8 List: Reason #8 Generation Data Sets. Top 8 List

Why Is This Subject Important? You Could Look It Up: An Introduction to SASHELP Dictionary Views. What Information is Listed in Dictionary Tables?

Compute; Your Future with Proc Report

A Dynamic Imagemap Generator Carol Martell, Highway Safety Research Center, Chapel Hill, NC

%check_codelist: A SAS macro to check SDTM domains against controlled terminology

APPENDIX 4 Migrating from QMF to SAS/ ASSIST Software. Each of these steps can be executed independently.

What Is SAS? CHAPTER 1 Essential Concepts of Base SAS Software

Compute Service: A RESTful Approach to the SAS Programming Environment

ABSTRACT: INTRODUCTION: WEB CRAWLER OVERVIEW: METHOD 1: WEB CRAWLER IN SAS DATA STEP CODE. Paper CC-17

PDF Multi-Level Bookmarks via SAS

Changes and Enhancements

SAS Visual Analytics Environment Stood Up? Check! Data Automatically Loaded and Refreshed? Not Quite

WHAT ARE SASHELP VIEWS?

What to Expect When You Need to Make a Data Delivery... Helpful Tips and Techniques

PROC FORMAT Tips. Rosalind Gusinow 2997 Yarmouth Greenway Drive u Madison, WI (608) u

Syntax Conventions for SAS Programming Languages

Using Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY

BreakOnWord: A Macro for Partitioning Long Text Strings at Natural Breaks Richard Addy, Rho, Chapel Hill, NC Charity Quick, Rho, Chapel Hill, NC

SAS/ACCESS Interface to R/3

ABSTRACT INTRODUCTION THE GENERAL FORM AND SIMPLE CODE

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

Tales from the Help Desk 6: Solutions to Common SAS Tasks

SAS Macro Language: Reference

Posters. Workarounds for SASWare Ballot Items Jack Hamilton, First Health, West Sacramento, California USA. Paper

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

Combining TLFs into a Single File Deliverable William Coar, Axio Research, Seattle, WA

Greenspace: A Macro to Improve a SAS Data Set Footprint

USING SAS SOFTWARE TO COMPARE STRINGS OF VOLSERS IN A JCL JOB AND A TSO CLIST

Using PROC REPORT to Cross-Tabulate Multiple Response Items Patrick Thornton, SRI International, Menlo Park, CA

Configuring SAS Web Report Studio Releases 4.2 and 4.3 and the SAS Scalable Performance Data Server

PROC REPORT AN INTRODUCTION

An Efficient Tool for Clinical Data Check

Transcription:

A Quick and Easy Data Dictionary Macro Pete Lund, Looking Glass Analytics, Olympia, WA ABTRACT A good data dictionary is key to managing even a modest-sized project. There are limited tools available in SAS to give a comprehensive, project- wide look at the contents of your datasets. This paper presents a macro that generates a browser-based data dictionary that goes beyond PROC CONTENTS in a number of ways. Any number of datasets can be passed to the macro and the results displayed in a table of contents-driven web page. Additionally, all user-written formats associated with dataset variables are hyperlinked, allowing the user to click on a format name and view the format values and labels. THE CALL The call to the macro is relatively simple. There are only four required parameters: PROJECT: a name for the dictionary that will be displayed as the Table of Contents header. This is simply a text string, for example, "Jail Data". HTMLPATH: the directory where the generated HTML code will be stored. This is just the directory name, "d :\projects\jail". FRAMENAME: the name of the HTML file that "drives" the process (the one to open). This is just the file name, with an.htm or.html extension, "jail_data.htm". DSLIST: the list of datasets to be included in the dictionary. Datasets in the list are separated by spaces. Entire libraries can be included with a libref. * reference. For example, "jail.* gen.sample". Using the examples above a call to the macro would be: %DataDictionary(project=Jail Data, htmlpath=d:\projects\jail, framename=jail_data.htm, dslist=jail. gen.sample) Additionally, dataset inclusion in the dictionary can be filtered based on the creation date of the dataset with following parameters: DATEMIN: begin date on which to keep datasets. DATEMAX: end date on which to keep datasets. THE OUTPUT The output of the macro is a number of HTML files. The file that will be viewed is referenced in the framename parameter and is stored in the directory referenced in the htmlpath parameter. Opening this file will display a two-frame file with a table of contents in the left frame and information about a dataset in the right frame. 7

There is one addional file stored in the htmlpath directory. This file contains the table of contents information and is named using the framename name with "_contents" added. In the example above, this file would be named, "jail_data_contents.htm". The table of contents contains an entry for each dataset referenced in the ds/ist parameter. Clicking on a dataset name will display information about that dataset. 1. JAlL.ElOOKING5V\>ITHSERIOUSCHARGE Dataset 2. JAIL.INCARCERATIONMONTHS Dataset 3. JAIL.LOS D;;,taset 4. JAIL.TABLELABELS Dll!aset 5. GEN.Sil.MPLE Dataset -- ---- firm~ ::: C al&rm!i'w'il' BOOI'\NO$'/~;,f'iiCv;(l<i,>oMlil f-'l'll'.h&il'l~tp~~ C'hM'UI:'II: 111111111- Qillllig.: :IMIIIII -... Additionally, If a variable has an associated user-written format, the format name is hyperlinked in the HTML output. Clicking on the format name will display the format values and labels. There is an HTML file for each format found in the datasets referenced in the dslist. They are stored in a directory called ''formats", under the htmlpath directory. The format HTML files are called "<libref>. <catalog>. <format>.htm". Clicking on the $JurFmt. Format link below There is an HTML file created for each dataset stored in a directory called "datasets", under the htm/path directory. The dataset HTML files are called "<libref>.<dataset>.htm". The dataset information is stored in an HTML table and contains dataset-level information: dataset name and label, physical location, creation date, size, number of variables, etc. It also contains variable-level information: variable name, type, length, format, index usage and label. would display the following web page: -- Rango nd Contents of format: $JURFMT. On catalog: JAIL.FORMATS) '--..... WA0170000 WA0170000 KC- KC. Pollee Dept WA0170100 WA0170100 KC- AUburn Ponce Dept WA017011C WA017011C KC- Rehabilitative Sei'Yices WA017013A WAD17013A KC- King County Prosecutor WA011013C WAIJ17013C KC Rehabilitative Services WA017013J WA017013J KC Seattle DlstrlctC()Urt WAA17n1fifi WAfl1701fifi K(';. l(lnn (";n. nthar In order for the format linking to work correctly, the SAS FMTSEARCH= option must be set with the same values that would be used If the datasets were being used. If not, formats may not be found or 8

a format from the wrong catalog may be linked. Output in this form allows the user quick access to information about all the datasets in a project. Access to formatted values of variables is also a significant enhancement over standard PROC CONTENTS output. HOW IT WORKS The process for creating the data dictionary files can be broken down into the following steps. Determine if Paths Exist The first step is to determine if the paths required for the process exist - this includes the path referenced in the htmlpath parameter and the /datasets and /formats subdirectories. If the directories do not exist they are created. This is done via a subordinate macro that is run for each of the three paths. There is a bunch of error handling in that macro (does the drive exist, is the fileref eight characters or less), but the crux of it comes down to a single line that is executed if the path does not exist: %let rc = %sysfunc(system(md "&dname")); The path (&dname) is passed to the operating system command MD (make directory) and the job is done. Flesh out the dataset list The dataset list that is passed in the &Dslist parameter may contain one-level names or wildcards (*). The process loops through all the datasets and checks for completeness of the name, i.e. a twolevel name. If any one-level names are passed a WORK. prefix is added. %do i = 1 %to &_Numos; %let _piece= %scan(&dslist,&i,%str( )); %if %index(&_piece,.) eq 0 %then %let _piece = WORK.&_piece; If any library-level wildcards are passed a list of all the datasets in the library is created by concatenating the libname and member names together in spaceseparated list. %if %index(&_piece,*) ne 0 %then %do; proc sql noprint; select compress(libnamell '.'I lmemname) into :Biglist separated by ' ' from sashelp.vstable where libname eq upcase ("%scan (&_piece, 1,.) "); When finished with this process a complete list of two-level dataset names is available. Get Variable Information From the COLUMNS and TABLES DICTIONARY tables information about the dataset variables is loaded into a dataset. The macro variable &NewDS contains a quoted, comma-separated list of datasets in the dataset list. proc sql noprint; create table DSinfo as select c.*,crdate from sashelp.vcolumn c, sashelp.vtable t where c.memname eq t.memname and c.libname eq t.libname &OateRange and c.libnamel I'.' I lc.memname in (&NewOS) order by upcase(c.name); A variable containing the path to a format HTML file is also created here. This variable will have the structure noted above in THE CALL section and will be 9

used to create the hyperlink to the format file. Additional dataset information is gathered from the MEMBERS DICTIONARY table. Get a List of the Available Formats The FMTSEARCH= value is grabbed with the GETOPTION function and the result is parsed into catalog names. %let FmtList=%sysfunc(getoption(fmtsearch)); Note: the value returned by the GETOPTION function contains parentheses around the FMTSEARCH value - these are stripped off in the actual code. Those catalog names are compared to the CATALOGS DICTIONARY table. The default catalog type.formats is added to the catalog name if there is no type in the catalog list. data _FmtList; length FormatEntry $100; %do 1 = 1 %to &NumCats; run; %let CatName = %scan(&fmtlist,&i,%str( )); %if %index(&catname,.) eq 0 %then %do; FormatEntry = &CatName'll ".FORMATS'; %else %do; FormatEntry = '&CatName'; ListPosition = &1; output; This dataset is then unduplicated by format name and position in the FMTSEARCH list. This will assure that if there are formats of the same name in multiple catalogs the correct one is pulled. This dataset is called _FmtHits. Match Formats to Variable List This list of all formats is compared to the list of formats associated with the variables in the datasets listed in the dslist parameter. Only requested formats are kept. When the variable information was gathered earlier a dataset of format information was created, _FormatsToFind. This dataset contains, among other things, the format name (ObjName) and the format type {ObyType). The type will be either character or numeric. proc sql; create table _FormatsFound as select h.*,format from _FmtHits h, _FormatsToFind f where h.objname eq f.objname and h.objtype eq f.objtype order by libname,memname; A macro variable "array" is created with all the format names. This list is looped through and a CNTLOUT dataset created from each format. The datasets are output with PROC SOL, using ODS to create HTML files. The ODS BODY parameter will create a file that has the name of the format catalog and format. For example, JAIL.FORMATS.JURFMT.HTM. ads html body= "formats \&&lib&i... &FmtName.. htm" path="&htmlpath" (url=none) style=datadictionary; proc format cntlout=_temp library=&&lib&i; runi select &FmtName; title 'Contents of format: &FmtName (in catalog: &&11b&i)'; 10

proc sql; select Start %if &Ranges gt 0 %then %str(label='range Start', End label='range End',); %else %str(label='value',); Label label='label' from _temp; The SOL generates a file with the start and end range values and the formatted value for the format. Note: an earlier step had determined whether the format contained any ranges or if all start and end values were the same. If there were no ranges (&Ranges eq 0) the output contains only the start and formatted values. Generate Dataset Output Finally, a PROC REPORT is run for each dataset in the dslist parameter. ODS is also used here to generate HTML files. proc report data=dsinfo nowd headline headskip split='*' contents='variable List'; column name fmtlink type length format FormatLib informat idxusage label; define name I order order=data 'Variable*Name' width=&maxlen; define type I display 'Type' width=6; define length I display 'Length' width=8; define format I display 'Format' width=10; define FormatLib 1 display 'Format catalog' width=15; define informat 1 display noprint; define idxusage 1 display 'Index usage' width=11; define label I display 'Label'; define fmtlink I display noprint; compute before _page_ 1 style=[just=left font_face=arial font_size=3 background=white foreground=red]; DSN = "Dataset Name: &memname") II "(in &path) ; DSL = "Dataset Label: "&memlabel"; line ' '; line @1 DSN $120.; line @1 DSL $120.; line ' '; endcomp; compute fmtlink; if fmtlink ne ' ' then call define ('format', "URL', fmtlink); endcomp; run; The new "URL" method in PROC REPORTs CALL DEFINE is used to create the link to the format HTML files. In the variable dataset the observation was flagged if it contained a user defined format- this was determined if the variable matched to a record in the list of formats created above. If so, a link is created using the value of the Fmtlink variable as the link location. This variable is created using the catalog and format name as was noted above. WHAT THE FUTURE HOLDS When working on a project like this, there are a number of enhancements that come to mind during the development process. First, if a format contains an embedded format as a label all that displays is the embedded format reference, for example, "[NewFmt.]". It would be preferable if that embedded link was also hyperlinked to the format referenced. This would require an iterative process for generating the format files, looking for embedded format names. Second, allow linking of additional metadata. If a user wants to maintain a separate file (or files) of information about datasets or variables there could be a mechanism for linking that information and displaying it in the data dictionary system. 11

AUTHOR CONTACT INFORMATION Pete Lund Looking Glass Analytics 215 Legion Way SW Olympia, WA 98501 (360) 528-8970 pete.lund@nwcsr.com www.lgan.com Copies of the macros needed for this process are available electronically upon request. TRADEMARK INFORMATION SAS is a registered trademark of SAS Institute Inc., Cary, NC, USA. 12