The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data

Similar documents
WHAT ARE SASHELP VIEWS?

A Better Perspective of SASHELP Views

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

Open Problem for SUAVe User Group Meeting, November 26, 2013 (UVic)

Uncommon Techniques for Common Variables

ABSTRACT INTRODUCTION MACRO. Paper RF

Better Metadata Through SAS II: %SYSFUNC, PROC DATASETS, and Dictionary Tables

Same Data Different Attributes: Cloning Issues with Data Sets Brian Varney, Experis Business Analytics, Portage, MI

Base and Advance SAS

SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada

SQL Metadata Applications: I Hate Typing

Autocall Macros A Quick Overview

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

Tips & Tricks. With lots of help from other SUG and SUGI presenters. SAS HUG Meeting, November 18, 2010

SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD

Arthur L. Carpenter California Occidental Consultants, Oceanside, California

Know Thy Data : Techniques for Data Exploration

Taming a Spreadsheet Importation Monster

Dictionary.coumns is your friend while appending or moving data

Procedure for Stamping Source File Information on SAS Output Elizabeth Molloy & Breda O'Connor, ICON Clinical Research

Tales from the Help Desk 6: Solutions to Common SAS Tasks

Introduction. Getting Started with the Macro Facility CHAPTER 1

Building Sequential Programs for a Routine Task with Five SAS Techniques

Mimicking the Data Step Dash and Double Dash in PROC SQL Arlene Amodeo, Law School Admission Council, Newtown, PA

Why Is This Subject Important? You Could Look It Up: An Introduction to SASHELP Dictionary Views. What Information is Listed in Dictionary Tables?

Abstract. Introduction. How Are All of These Tables Related? - Relational Database Map - RDB_MAP.SAS

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

How to Create Data-Driven Lists

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

Beginning Tutorials. PROC FSEDIT NEW=newfilename LIKE=oldfilename; Fig. 4 - Specifying a WHERE Clause in FSEDIT. Data Editing

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

Adjusting for daylight saving times. PhUSE Frankfurt, 06Nov2018, Paper CT14 Guido Wendland

Using Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY

How to Keep Multiple Formats in One Variable after Transpose Mindy Wang

Give me EVERYTHING! A macro to combine the CONTENTS procedure output and formats. Lynn Mullins, PPD, Cincinnati, Ohio

Developing Data-Driven SAS Programs Using Proc Contents

Copy That! Using SAS to Create Directories and Duplicate Files

CHAPTER 7 Using Other SAS Software Products

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

Using PROC SQL to Generate Shift Tables More Efficiently

Efficient Processing of Long Lists of Variable Names

PREREQUISITES FOR EXAMPLES

Simplifying Effective Data Transformation Via PROC TRANSPOSE

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

Data Manipulation with SQL Mara Werner, HHS/OIG, Chicago, IL

A Quick and Gentle Introduction to PROC SQL

David Ghan SAS Education

Merge Processing and Alternate Table Lookup Techniques Prepared by

Amie Bissonett, inventiv Health Clinical, Minneapolis, MN

Hidden in plain sight: my top ten underpublicized enhancements in SAS Versions 9.2 and 9.3

Using PROC REPORT to Cross-Tabulate Multiple Response Items Patrick Thornton, SRI International, Menlo Park, CA

Automatic Indicators for Dummies: A macro for generating dummy indicators from category type variables

Is Your Data Viable? Preparing Your Data for SAS Visual Analytics 8.2

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

An Introduction to SAS/FSP Software Terry Fain, RAND, Santa Monica, California Cyndie Gareleck, RAND, Santa Monica, California

BreakOnWord: A Macro for Partitioning Long Text Strings at Natural Breaks Richard Addy, Rho, Chapel Hill, NC Charity Quick, Rho, Chapel Hill, NC

ABC Macro and Performance Chart with Benchmarks Annotation

MOBILE MACROS GET UP TO SPEED SOMEWHERE NEW FAST Author: Patricia Hettinger, Data Analyst Consultant Oakbrook Terrace, IL

Paper William E Benjamin Jr, Owl Computer Consultancy, LLC

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board

Bruce Gilsen, Federal Reserve Board

MIS Reporting in the Credit Card Industry

SAS Online Training: Course contents: Agenda:

SAS 101. Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23. By Tasha Chapman, Oregon Health Authority

DBLOAD Procedure Reference

Exploring DICTIONARY Tables and SASHELP Views

SAS CURRICULUM. BASE SAS Introduction

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys

SAS Visual Analytics Environment Stood Up? Check! Data Automatically Loaded and Refreshed? Not Quite

A Cross-reference for SAS Data Libraries

STEP 1 - /*******************************/ /* Manipulate the data files */ /*******************************/ <<SAS DATA statements>>

PharmaSUG Paper PO12

Posters. Workarounds for SASWare Ballot Items Jack Hamilton, First Health, West Sacramento, California USA. Paper

Validation Summary using SYSINFO

Ten Great Reasons to Learn SAS Software's SQL Procedure

SAS ENTERPRISE GUIDE WHAT LIES BEHIND ALL THESE WINDOWS FOR PROGRAMMERS. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

ABSTRACT INTRODUCTION WORK FLOW AND PROGRAM SETUP

PhUSE Paper CC07. Slim Down Your Data. Mickael Borne, 4Clinics, Montpellier, France

Creating Zillions of Labels (and Other Documents) the Easy Way with ODS and Microsoft Word

Using a Fillable PDF together with SAS for Questionnaire Data Donald Evans, US Department of the Treasury

Create Metadata Documentation using ExcelXP

Data Quality Review for Missing Values and Outliers

Paper B GENERATING A DATASET COMPRISED OF CUSTOM FORMAT DETAILS

%whatchanged: A Tool for the Well-Behaved Macro

SAS Macro Language: Reference

PROC MEANS for Disaggregating Statistics in SAS : One Input Data Set and One Output Data Set with Everything You Need

The Proc Transpose Cookbook

Going Under the Hood: How Does the Macro Processor Really Work?

A Macro to Manage Table Templates Mark Mihalyo, Community Care Behavioral Health Organization, Pittsburgh, PA

9 Ways to Join Two Datasets David Franklin, Independent Consultant, New Hampshire, USA

Using Metadata Queries To Build Row-Level Audit Reports in SAS Visual Analytics

Get into the Groove with %SYSFUNC: Generalizing SAS Macros with Conditionally Executed Code

Sample Questions. SAS Advanced Programming for SAS 9. Question 1. Question 2

T-SQL Training: T-SQL for SQL Server for Developers

Data Warehousing. New Features in SAS/Warehouse Administrator Ken Wright, SAS Institute Inc., Cary, NC. Paper

SAS Macro Dynamics: from Simple Basics to Powerful Invocations Rick Andrews, Office of Research, Development, and Information, Baltimore, MD

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System

Transcription:

Paper PO31 The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data MaryAnne DePesquo Hope, Health Services Advisory Group, Phoenix, Arizona Fen Fen Li, Health Services Advisory Group, Phoenix, Arizona ABSTRACT This paper demonstrates the combined power of PROC SQL and SAS Dictionary Tables to assist in the data management of multi-year health care survey data. The survey data, collected yearly, usually require some modifications to fields and file names to adjust for year-to-year changes in survey administration. As in all programming aspects of a project, it is essential that the programming techniques are efficient and adaptable in the data handling processes. Structured Query Language (SQL) is a powerful database language that can be used to access SAS Dictionary Tables, which contain information about data files open in a SAS session. Examples presented in this paper demonstrate techniques of applying PROC SQL to the SAS Dictionary Tables, SASHELP.VCOLUMN and SASHELP.VTABLE. These techniques easily and quickly address survey data management tasks including renaming variables, label creation, conversion of variable characteristics, automating file lists and file comparisons. INTRODUCTION Health Services Advisory Group Inc. (HSAG) is Arizona s largest health care quality review organization. HSAG is currently working on a number of large-scale survey projects, including the Medicare Health Outcomes Survey (HOS). The Medicare HOS measures the physical and mental health status of Medicare beneficiaries in managed care settings. The Medicare HOS, sponsored by the Centers for Medicare & Medicaid Services (CMS), is administered annually to a randomly selected sample of Medicare Advantage (MA) Plan members from each applicable Medicare contract market area in the United States. A random sample of 1,000 individuals is selected at baseline from each MA Plan and then resurveyed in two years. Challenges exist when multi-year data contain a large number of variables that change from year to year, or when it is necessary to compare a large number of data files at different group levels. SQL extract techniques have been implemented to more easily handle changing requirements and characteristics of the data. These techniques are used instead of hard coding key values, cutting and pasting sections of code, or using the conventional DATA and PROC step methods that may require many program steps and lengthy lines of code. The following examples show how SQL is used to extract and modify valuable data set information that is available in the SASHELP.VCOLUMN table and the SASHELP.VTABLE table. The coding techniques used to rename variables, create labels, convert variable types, generate file lists and compare files are described and demonstrated in the following examples. SQL PROC SQL is a database language that incorporates features that can simplify and consolidate coding requirements. Using these features results in fewer program steps and shorter lines of code when compared to the conventional DATA step and PROC step techniques. Some of the common usages of PROC SQL include joining tables, extracting data, grouping and ordering data, creating and modifying tables, subsetting data, and creating macro variables. SASHELP DICTIONARY TABLES The SASHELP library contains dictionary tables and view tables that are automatically created when a SAS session is started and automatically updated throughout the SAS session. These resources are meta tables (data about data) that provide a wealth of information about the current data files in the SAS session. The view tables are stored in the SASHELP Library and prefixed with a V. The view tables contain components of SAS data files such as columns, formats, indexes, macros, and tables. The COLUMN view table and TABLE view table are the specific focus in this paper.

SASHELP VCOLUMN TABLE The SASHELP.VCOLUMN table includes data set information at the variable level. Below are some examples of variables contained in the column view table [description (variable name)]: name (name) type (type) length (length) label (label) format (format) informat (informat) position (npos) order number in table (varnum) SASHELP VTABLE TABLE The SASHELP.VTABLE table includes the data set information at the file level. Below is a list of some of the frequently used VTABLE variables [description (variable name)]: library name (libname) file name (memname) file type (memtype) number of observation (nobs) file label (memlabel) number of variables (nvar) file creation date (crdate) file modification date (modate) EXAMPLES USING PROC SQL AND DICTIONARY TABLES RENAMING VARIABLES Survey data frequently have a large number of variables and often there is a need to rename variables in the data set in order to merge data or modify the variable names for input to generic programs. The name field in the SASHELP.VCOLUMN table is used to extract only the required variables from the data file. Example 1 uses PROC SQL and the SASHELP.VCOLUMN table to demonstrate the selection of variables to be renamed. Example 1: The first step is to execute PROC SQL to extract the selected variables that are available in the SASHELP.VCOLUMN table in the stored SAS data set named HDATA (libname is PLAN ). The where option is used to select all the numeric variables with the exception of the 'V1PATID' variable in the data set. A macro variable called mnlist is created that contains a string comprised of the rename SAS statements. The string contains the original variable name, an equal sign and the new variable name with each rename assignment delimited by a blank space. The substr function strips off the two-character prefix and adds the _MN suffix to all the variable names. The separated by creates a space delimiter between the rename statements, and compress will remove any spaces or unwanted characters preceding the variable name. The trim function, preceded by left will remove trailing spaces and left justify the renamed variable name. proc sql noprint; select compress(name) '=' trim(left(substr(name,3))) '_' 'MN' into :mnlist separated by from sashelp.vcolumn where libname='plan' and memname='hdata' and type= num and name not in ( V1PATID ); The second step in the process is to run a data step using the macro variable &mnlist. The macro &mnlist is used to provide the renaming code in the data step statement. data renmfile (rename=(&mnlist)); set plan.hdata; The log from the data renmfile data step below shows the resolution of the macro variable mnlist. 41 data renmfile 41 (rename=(&mnlist)); SYMBOLGEN: Macro variable MNLIST resolves to V1HTH=HTH_MN V1HTHN=HTHN_MN V1VIG=VIG_MN V1MOD=MOD_MN V1LFT=LFT_MN V1CLMB=CLMB_MN

V1CLMBN=CLMBN_MN V1BND=BND_MN V1WLK=WLK_MN V1WLKB=WLKB_MN 42 set plan.hdata; LABEL CREATION One of the required data management tasks is to create Comma Separated Value (CSV) files from the SAS survey data and to create a labeled row for the variables in the CSV text file. Typing the labels directly into the CSV file is time consuming and prone to error. Example 2 illustrates the use of PROC SQL to quickly and accurately access the labels stored in the SAS data set using the SASHELP.VCOLUMN table, then transposing the captured labels and creating an EXCEL file that includes the label row. Example 2: PROC SQL is used to create a table named outds that contains variable names (name) and the corresponding label (label). These fields are extracted from the SASHELP.VCOLUMN table in a data set named HDATA. Selection of variables is based on numeric variables with a length of 8 and excludes the variable V1PATID. create table outds as select name, label from sashelp.vcolumn where libname="plan" and memname="hdata" and type= num and length=8 and name not in ('V1PATID'); Table Outds: Results of SQL Extraction Name Label V1VAR08 First Variable Label 08 V1VAR18 Second Variable Label 18 V1VAR28 Third Variable Label 28 To produce the CSV file with the variable name and its corresponding label, the outds table is used as an input data set in the TRANSPOSE procedure. The label field values and the name field values are transposed to two rows that contain the values from the two fields. proc transpose data=outds out=tr_outds (drop =_name label_) ; var name label; Table Tr_outds: Results of the Proc Transpose. Col1 Col2 Col3 1 V1VAR08 V1VAR18 V1VAR28 2 First Variable Label 08 Second Variable Label 18 Third Variable Label 28 Next, the PROC EXPORT syntax is used to export the table tr_outds into the CSV text file addlabels.csv. This label EXCEL file is then concatenated to the EXCEL data file. Another method would be to set the two SAS data sets, tr_outds and HDATA, before exporting to EXCEL format. proc export data = tr_outds outfile ="C:\addlabels.csv" dbms=csv replace; VARIABLE TYPE CONVERSION Changing a large number of numeric variables into character variables and visa versa is a common process in health care survey data management. As shown in Example 3, converting numeric variables to character variables using the SASHELP.VCOLUMN table and the PROC SQL procedure is completed in an efficient manner.

Example 3: The first step is to create macro variables using the where option to select variables that are stored in the SASHELP.VCOLUMN table in the data set called HDATA. The LIKE operator with the %8 placeholder is used to identify variables that have 8 in the name. All the variables satisfying the 8 criteria will be included in the macro variables named chr1 and _chr1. The former contains a list of these selected variables separated by a space, and the latter also contains a list of the same variables but prefixed with _. proc sql noprint ; select compress(name), "_" compress(name) into : chr1 separated by ' ', : _chr1 separated by ' ' from sashelp.vcolumn where libname="plan" and memname="hdata" and name like %8 ; Below is the result of using a %put to see the values in the two macro variables. 137 %put &chr1 &_chr1; V1VAR08 V1VAR18 V1VAR28 _V1VAR08 _V1VAR18 _V1VAR28 The data step vartype uses the macro variables in the ARRAY statement, along with the put statement to convert the numeric variable into the character variables. The trim function, preceded by left will remove trailing spaces and left justify the converted variable name. data vartype (drop=&chr1 k); length &_chr1 $12; set plan.hdata; array n_vars{3} &chr1; array c_vars{3} &_chr1; do k=1 to 3; c_vars{k}=left(trim(put(n_vars{k},12.9))); end; Variable Type Conversion Before Conversion After Conversion Variable Type Variable Type V1VAR08 Num _ V1VAR08 Chr V1VAR18 Num _ V1VAR18 Chr V1VAR28 Num _ V1VAR28 Chr FILE LIST GENERATION FOR MERGING Each year more than 150 health care survey data files are distributed to health care plans nationwide. To ensure that the correct numbers of data files are generated for distribution, validation is required to match the electronic data files against a list of appropriate plans. Manually checking each electronic data file name against the plan list is feasible but labor intensive and prone to error. The code in Example 4 demonstrates the use of PROC SQL combined with the SASHELP.VTABLE table to automate the validation process. Example 4: PROC SQL is run to access the list of the SAS data files names stored in the SASHELP.VTABLE table in the library called PLANDATA. A table named filelist is created that contains this list of data file names. The filelist table below shows the table generated by PROC SQL using SASHELP.VTABLE table. create table filelist as select memname from sashelp.vtable

where libname="plandata" ; Filelist Table (B) MEMNAME AL_DATA AZ_DATA CA_DATA CO_DATA Master Plan Table (A) PLANID AL_DATA AZ_DATA CA_DATA CO_DATA CT_DATA The filelist table created in the previous code will be compared to an existing SAS table, planlist. The following code uses a PROC SQL left join to merge the master table ( planlist ) with the previously created table filelist. The two fields used for the match-merge are planid which is in the planlist (A) table, and memname which is in the filelist (B) table. The left join specifies that all the observations in table A and only matching observations from the B table are included in the resulting table. The resulting table Validation contains any data file that is not in the filelist table. The order by option will arrange the data file list alphabetically. title 'Electronic Data File List Checking'; create table validation (where=(memname= )) as select A. *, B. * from plan.planlist A left join filelist B on (A.planid=B.memname) order by planid; The result of the match-merge is below. Validation Table PLANID CT_DATA FILE LIST GENERATION FOR AUTOMATIC FILE COMPARISON Another of HSAG s tasks is to create text files for data distribution. In order to verify the accuracy of the text file generated, data is re-imported from each text file back to SAS and then compared to the original source SAS data file. (Note: the imported SAS data file and the SAS source data file have identical file names). Because of the need to generate a large number of text files for each health plan, it is challenging to compare many pairs of data set names. Example 5 presents the code that has been developed to automate the data comparison process. Example 5: The source data files are stored in a libname called SOURCE and the imported SAS data files are stored in a libname called IMPORT. First, PROC SQL is used to extract the file names from the SASHELP.VTABLE table in the SOURCE library in alphabetic order. This step creates a table, sourcelist that contains a master list of names of the data sets. create table sourcelist as select memname from sashelp.vtable where libname="source" order by memname; Next, using the data set sourcelist, the data step newfile is used to execute a CALL SYMPUT. This statement stores the value rank from _N_ and assigns it to the macro variable datafile which drives the %do looping processing. The CALL SYMPUT within the do loop captures each value of a data file name (memname) and stores the name in the macro variable dataname. The do loop is processed for each data set name and then each data set

in each library (source and import) is sorted by a key variable. Each pair of data sets is then compared using the PROC COMPARE procedure. The result of the PROC COMPARE procedure is the validation ensuring that there is a 100% match on content of the imported file and the source file. %macro autocmp; data newfile; set sourcelist; rank = _n_; call symput ("datafile", put (rank, 2.)); %do x = 1% to &datafile; data _null_; set newfile; if rank=&x; call symput ("dataname", trim (memname)); proc sort data=source.&dataname; by V1PATID; proc sort data=import.&dataname; by V1PATID; proc compare base=source.&dataname compare=import.&dataname; id V1PATID; %end; %mend autocmp; %autocmp; CONCLUSION The SAS Dictionary Tables provide direct access to valuable information about SAS data sets available in a SAS session. Using PROC SQL with these tables offers a comprehensive and powerful method to reduce the coding time necessary to accomplish data handling and validation tasks. Applying SQL techniques demonstrated in this paper can automate processes for easier and more efficient programming. REFERENCES SAS SQL Procedure User s Guide Version 8. 2000. Cary, NC: SAS Institute, Inc. SAS Institute Inc. 2003. SAS OnlineDoc 9.1. Cary, NC: SAS Institute, Inc. SAS Technical Support, SN-009581, Cary, NC: SAS Institute, Inc. SPECIAL ACKNOWLWDGEMENTS The authors would like to acknowledge the Medicare Medicare Health Outcomes Survey team at HSAG for review of this paper. CONTACT INFORMATION Your comments and questions are valued and encouraged. MaryAnne DePesquo Hope Health Services Advisory Group, Inc. 1600 E. Northern Ave., Suite 100 Phoenix, AZ 85020 Work Phone: 602-745-6312 Fax: 602-241-0757 mhope@hsag.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies.