Computing Environments: NT 4.0 SAS Products: PC SAS 6.12 Other software products: Microsoft Word 97

Similar documents
Post-Processing.LST files to get what you want

SAS 101. Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23. By Tasha Chapman, Oregon Health Authority

Writing Programs in SAS Data I/O in SAS

ICD_CLASS SAS Software User s Guide. Version FY Prepared for: U.S. Centers for Disease Control and Prevention

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

ICD_CLASS SAS Software User s Guide. Version FY U.S. Centers for Disease Control and Prevention. Prepared for:

Macro Method to use Google Maps and SAS to Geocode a Location by Name or Address

Prove QC Quality Create SAS Datasets from RTF Files Honghua Chen, OCKHAM, Cary, NC

3. Almost always use system options options compress =yes nocenter; /* mostly use */ options ps=9999 ls=200;

Matt Downs and Heidi Christ-Schmidt Statistics Collaborative, Inc., Washington, D.C.

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

Implementing external file processing with no record delimiter via a metadata-driven approach

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

Using PROC REPORT to Cross-Tabulate Multiple Response Items Patrick Thornton, SRI International, Menlo Park, CA

A Macro to Manage Table Templates Mark Mihalyo, Community Care Behavioral Health Organization, Pittsburgh, PA

Base and Advance SAS

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

SAS Drug Development Program Portability

A Macro that can Search and Replace String in your SAS Programs

A Way to Work with Invoice Files in SAS

Exporting & Importing Datasets & Catalogs: Utility Macros

Pros and Cons of Interactive SAS Mode vs. Batch Mode Irina Walsh, ClinOps, LLC, San Francisco, CA

Run your reports through that last loop to standardize the presentation attributes

APPENDIX 4 Migrating from QMF to SAS/ ASSIST Software. Each of these steps can be executed independently.

Use That SAP to Write Your Code Sandra Minjoe, Genentech, Inc., South San Francisco, CA

SUGI 29 Data Warehousing, Management and Quality

Making an RTF file Out of a Text File, With SAS Paper CC13

To conceptualize the process, the table below shows the highly correlated covariates in descending order of their R statistic.

ODS DOCUMENT, a practical example. Ruurd Bennink, OCS Consulting B.V., s-hertogenbosch, the Netherlands

2017 MOC PEDIATRIC PRACTICE LOG TEMPLATE:

Chapter 2: Getting Data Into SAS

Essential ODS Techniques for Creating Reports in PDF Patrick Thornton, SRI International, Menlo Park, CA

Routing Output. Producing Output with SAS Software CHAPTER 6

Uncommon Techniques for Common Variables

Syntax Conventions for SAS Programming Languages

SAS PROGRAMMING AND APPLICATIONS (STAT 5110/6110): FALL 2015 Module 2

Procedures. PROC CATALOG CATALOG=<libref.>catalog <ENTRYTYPE=etype> <KILL>; CONTENTS <OUT=SAS-data-set> <FILE=fileref;>

The correct bibliographic citation for this manual is as follows: SAS Institute Inc Proc EXPLODE. Cary, NC: SAS Institute Inc.

Desktop Charge Capture

Electricity Forecasting Full Circle

HAVE YOU EVER WISHED THAT YOU DO NOT NEED TO TYPE OR CHANGE REPORT NUMBERS AND TITLES IN YOUR SAS PROGRAMS?

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

Open Problem for SUAVe User Group Meeting, November 26, 2013 (UVic)

Omitting Records with Invalid Default Values

SAS Application Development Using Windows RAD Software for Front End

A Macro to Keep Titles and Footnotes in One Place

The TIMEPLOT Procedure

ODS/RTF Pagination Revisit

Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA

DANCES WITH DATA. Ginger Barlow, Boots Pharmaceuticals, Inc. Angela S. Ringelberg, Boots Pharmaceuticals, Inc. I. EXPLANATION OF PROBLEM

A Macro that Creates U.S Census Tracts Keyhole Markup Language Files for Google Map Use

Producing Summary Tables in SAS Enterprise Guide

Using SAS Macro to Include Statistics Output in Clinical Trial Summary Table

Using SAS software to fulfil an FDA request for database documentation

Excel Level 1

Hidden in plain sight: my top ten underpublicized enhancements in SAS Versions 9.2 and 9.3

ABC Macro and Performance Chart with Benchmarks Annotation

A Cross-national Comparison Using Stacked Data

Taming a Spreadsheet Importation Monster

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX

Automated Macros to Extract Data from the National (Nationwide) Inpatient Sample (NIS)

A Practical and Efficient Approach in Generating AE (Adverse Events) Tables within a Clinical Study Environment

AMERICAN BOARD OF UROLOGY 2017 INSTRUCTIONS FOR SUBMISSION OF ELECTRONIC LOGS

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

An Introduction to Visit Window Challenges and Solutions

Preserving your SAS Environment in a Non-Persistent World. A Detailed Guide to PROC PRESENV. Steven Gross, Wells Fargo, Irving, TX

Text Generational Data Sets (Text GDS)

Privacy Protection Using Base SAS : Purging Sensitive Information from Free Text Emergency Room Data

Using DDE with Microsoft Excel and SAS to Collect Data from Hundreds of Users

An Introduction to SAS University Edition

How to use UNIX commands in SAS code to read SAS logs

Automating the Production of Formatted Item Frequencies using Survey Metadata

The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC

A Macro To Generate a Study Report Hany Aboutaleb, Biogen Idec, Cambridge, MA

10 The First Steps 4 Chapter 2

Procedure for Stamping Source File Information on SAS Output Elizabeth Molloy & Breda O'Connor, ICON Clinical Research

Posters. Paper

Microsoft Access Illustrated. Unit B: Building and Using Queries

Chapter 7 File Access. Chapter Table of Contents

Chaining Logic in One Data Step Libing Shi, Ginny Rego Blue Cross Blue Shield of Massachusetts, Boston, MA

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA

Introduction to PROC SQL

Getting it Done with PROC TABULATE

Teacher Evaluations - A tutorial in some advanced features of SAS. Glenn Millard Stephen F. Austin State University

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

Customizing Your SAS Session

Excel Introduction to Excel Databases & Data Tables

PDF Multi-Level Bookmarks via SAS

Using UNIX Shell Scripting to Enhance Your SAS Programming Experience

SAS and Data Management Kim Magee. Department of Biostatistics College of Public Health

A Practical Introduction to SAS Data Integration Studio

SAS CURRICULUM. BASE SAS Introduction

PharmaSUG Paper PO12

SAS Viya 3.1 FAQ for Processing UTF-8 Data

Paper An Automated Reporting Macro to Create Cell Index An Enhanced Revisit. Shi-Tao Yeh, GlaxoSmithKline, King of Prussia, PA

A SAS/AF Application for Linking Demographic & Laboratory Data For Participants in Clinical & Epidemiologic Research Studies

Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research

2. Don t forget semicolons and RUN statements The two most common programming errors.

Transcription:

Automating the Quantification of Healthcare Resource Utilization With Base SAS Software Nancy Bourgeois, Manpower Technical, Research Triangle Park, NC Bobbie Coleman, Glaxo Wellcome, Inc., Research Triangle Park, NC ABSTRACT Healthcare researchers often use prevalence, incidence, and rates of patient visits to physicians, hospitalizations and nursing home stays in the general population to better understand the healthcare resource utilization of diseases. These data contribute to the decision-making process regarding which diseases to treat or prevent to improve public health and to minimize healthcare costs. By making this information on healthcare resource utilization from three commonly used US healthcare surveys readily available, these data can be used by epidemiologists within the company without waiting for a programmer to write a program to provide this information. Coming Environments: NT 4.0 SAS Products: PC SAS 6.12 Other software products: Microsoft Word 97 INTRODUCTION In order to make information from three commonly used US Healthcare surveys available to epidemiologists and other interested groups in the company, we created a set of SAS programs and macros to generate reports in several formats that would be useful to these groups. The National Center for Health Statistics (NCHS) datasets that we are concerned with and the types of information they represent are: 1. NAMCS National Ambulatory Medical Care Survey Physician Visit Resource Utilization. 2. NHDS National Hospital Discharge Survey Hospital Resource Utilization. 3. NNHS National Nursing Home Survey Nursing Home Resource Utilization. These programs can be extended to also report on Mortality data using the Compressed Mortality File available from the CDC Web site. These programs can also be used as a model to quantify medication use and frequency of medical procedures with additional data available from the NAMCS and NHDS. We will not address those areas in this report. These programs can be modified easily so that new reports can be generated each year as the latest NCHS datasets become available. The resource utilization information for this project is reported for each ICD9-CM diagnostic code (International Classification of Disease, 9 th Revision, Clinical Modification) or for a specified grouping of these codes which represents a disease entity. In addition, each ICD9 code or code grouping is categorized under a specific Therapeutic Development Group (TDG) for use by specific areas within the company. These groupings and categorizations can be easily changed as the organization s structure changes or as the focus on specific diseases or disease groupings change. The examples shown in this paper will focus on the NAMCS/Physician Visit data. The NHDS/Hospitalization and the NNHS/Nursing Home programs will be very similar, with changes required to look at different in datasets and different numbers of diagnoses available. STEP 1: EXTRACT REQUIRED DATA FROM THE NCHS DATASETS The NCHS datasets are very large and we require only a few diagnostic and weighting fields from each of the datasets for this project. The SAS programs to extract these fields must be edited whenever a new version of the NCHS datasets are available to make any necessary changes such as dataset name, where the locations for the specific fields are, and the record length of the dataset. Once editing is complete, the SAS programs are ready for processing. A sample of the code for the NAMCS/Physician Visit dataset follows. Similar code would be executed to extract the data needed from the NHDS and NNHS datasets to obtain the Hospitalization data and the Nursing Home data. In each of the NCHS datasets, the ICD9 diagnoses codes are available in 3, 4 and 5 digits, representing the different levels of specificity for each diagnosis. For our organization, we required different specificity for different disease entities. You can see in the code that we are extracting the 3, 4 and 5 digit diagnoses codes for all of the records at this point. In Step 4, there is a table that designates which level of diagnostic code is desired for each disease entity. This can be easily modified to fit the organization s needs. The result of this step is a SAS dataset containing only the diagnostic codes and weights from the NCHS dataset. /* NAMCS SAS In Statements */ /*-------------------------------------*/ NAMCS.SAS Language: SAS for Windows v6.12 In: Namcs.dat Out: Namcs.SD2 Create data file from NAMCS. Extract the diagnosis (3, 4 and 5 digit codes) and medication fields. +-------------------------------------- /* Associate the fileref with the external data file for the NAMCS data */ /* Change the location of the dataset */ /* to pick data for desired year */ /* The LRECL may also change */ FILENAME in l:\namcs1997\namcs97.dat LRECL=610 /* DBData libname is set in Setup.SAS */ DATA DBData.namcs INFILE in RECFM=F LRECL=610 INPUT @154 diagnos1 $4.

RUN @159 diagnos2 $4. @164 diagnos3 $4. @154 diag3dg1 $3. @159 diag3dg2 $3. @164 diag3dg3 $3. @154 diag5dg1 $5. @159 diag5dg2 $5. @164 diag5dg3 $5. @297 weightpt 6. LABEL diagnos1= Four-digit diagnosis #1 diagnos2= Four-digit diagnosis #2 diagnos3= Four-digit diagnosis #3 diag3dg1= Three-digit diagnosis #1 diag3dg2= Three-digit diagnosis #2 diag3dg3= Three-digit diagnosis #3 diag5dg1= Five-digit diagnosis #1 diag5dg2= Five-digit diagnosis #2 diag5dg3= Five-digit diagnosis #3 weightpt= Patient Visit Weight STEP 2: SET UP THE ICD9 CODES AND DESCRIPTIONS This step is needed to obtain the long descriptions for each ICD9 code, so that they can be printed in the final Resource Utilization reports. We obtained the ICD9 ascii codes and descriptions file from Medicode, a division of Ingenix Publishing. If the codes and descriptors were obtained from other sources, the following code would have to be modified to point to the correct positions for the fields needed. This code assumes that the file containing this information is called ICD9.txt. After checking to be sure that the fields are in the correct locations, the following program is run to create a SAS dataset that contains all of the ICD9 codes with their long descriptions. /* ------------------------------------*/ ICD9.SAS Language: SAS for Windows v6.12 In: ICD9.txt Out: ICD9.SD2 Create the data file for the ICD9 codes and long descriptions +--------------------------------------*/ /* Associate the fileref with the external data file for the ICD9 data */ FILENAME inicd9 W:\DB\icd9.txt /* DBData libname is set in Setup.SAS */ DATA DBData.icd9 INFILE inicd9 RECFM=V INPUT icd9full $ nothing $ icd9desc & $200. LABEL icd9full= ICD9 Code with decimals nothing= nothing icd9desc= Description of ICD9 code RUN /*Create a field to contain the ICD9 code without the decimal points to correspond to the diagnostic codes in the NCHS datasets.*/ data DBData.icd9 (drop=nothing) set DBData.icd9 attrib icd9cat length=$20 label= ICD9 Code without decimals icd9cat = compress(icd9full,. ) out STEP 3: SET UP THE FORMATS FILE The format file is used to define the formats for the TDGs. Edit the following program and change the TDG categories as needed. Run the program to set the formats for printing. In Step 4, the specific ICD9 codes and groups of codes are assigned to their respective TDGs for categorization. /* -------------------------------------*/ DBFrmts.SAS Language: SAS for Windows v6.12 Defines the formats for the TDG Group that each ICD9 code falls into. +------------------------------------- */ /* LIBRARY libref set in SETUP.SAS */ /* Delete the existing formats in the library */ proc datasets library=library memtype=catalog nolist delete formats /* Define formats for all of the out variables and store them as permanent formats in the library */ proc format library=library /* Format for TDG Group */ value $TDGfmt = R = Respiratory H = HIV/OI I = ID/Hepatitis P = Neurology and Psychiatry G = GI/Metabolic/Musculoskeletal /Dermatology/Urogenital C = Hospital Critical Care O = Ophthalmology/Otology /Congenital Defects/Other /* Format for abbreviated TDG Group */ value TDGAfmt 0 = 1 = R 2 = H 3 = I 4 = P 5 = G 6 = C 7 = O

STEP 4: SET UP THE TABLE CONTAINING THE ICD9 CODES AND GROUPS OF CODES NEEDED BY YOUR ORGANIZATION The Word document (ICD9Table.doc) contains the information necessary to: select the desired ICD9 codes needed by your organization, group specific ICD9 codes into categories, set or change the descriptions for individual ICD9 codes or groups of codes, set the TDG for each code or groups of codes, indicate whether the codes should be grouped at the 3- digit, 4-digit, or 5-digit level. An example of this document is shown in Figure 1. Edit the document and make any necessary changes for your organization. You can list one ICD9 code per line in the document or, if several ICD9 codes will be in the same TDG and will be reported with the same number of digits, then the starting and ending ICD9 codes can be entered in the first two columns so that only one entry is required for similar codes. To group codes together into one disease entity in the final report, they will have to have the same Group As and Group Name information specified. They can be entered in the same row, if they are consecutive ICD9 codes, or they can be entered on multiple lines, as long as the Group As and Group Name information is the same. Entering this information in a table in a Microsoft Word document gives a non-programmer the ability to change the ICD9 code groupings, etc. when necessary. This information is converted into a SAS dataset in the next step, which automates the change process allowing faster turnaround time and less programmer involvement. STEP 5: CONVERT THE WORD DOCUMENT TO A SAS DATASET The Word document (ICD9Table.doc) created above, must be converted into a SAS dataset to be used in subsequent steps for this project. The steps to do this conversion are as follows: 1. Open the ICD9Table.doc file in MS Word and select all of the ICD9 codes in the table. Be sure to select all columns. 2. Convert the table to text. Separate the text with Other -! (exclamation point). 3. Choose Save As and save the file in desired directory as ICD9TableExport.txt, so the original document will not be overwritten. 4. Delete any headers or footers. 5. Delete column headings. 6. Delete any blank paragraphs. 7. Select search, then select replace to replace all occurrences of 2 exclamation points (!!) with exclamation point, period, exclamation point (!.!).This will ensure that any empty fields are set to NULL. 8. Save the file. 9. Run the following SAS program to convert the text to a SAS dataset: /* -------------------------------------*/ CreateICD9Table.SAS Language: SAS for Windows v6.12 Creates the SAS dataset containing the! ICD9 codes desired from a txt file with! as the delimiter. +--------------------------------------*/ /* DBData library set in SETUP.SAS */ data DBData.icd9tabl infile w:\db\icd9tableexport.txt dlm=! missover in icd9low $ icd9high $ todelete $ tdg $ numdigit type $ groupas: $25. groupnam: $200. STEP 6: RUN THE RESOURCE UTILIZATION PROGRAMS Each of the resource utilization programs use the ICD9 table created in the previous step to extract the desired diagnoses and corresponding weights from the NCHS datasets as well as group specific codes and categorize codes into their respective TDGs. We have included the Physician Visit program below to demonstrate this process. Similar code would be executed for the Hospitalization and Nursing Home programs. Part I of this program creates a dataset containing all of the 3 digit diagnostic codes from the NAMCS dataset as well as a datasets containing the 4 digit and 5 digit diagnostic codes. Any duplicate diagnoses for a single visit are eliminated. Part II pulls in the include file which contains the code that processes the ICD9Table that contains the ICD9 codes desired and uses that data to extract those specific diagnoses from the NAMCS data. This code can be used without change for all of the other NCHS datasets. Part III consists of several PROC SQL statements that group and summarize the ICD9 codes together and calculate totals and percentages for both the reliable and unreliable estimates. The macro CalcTot can be used unchanged for all of the NCHS datasets. Part IV uses a common macro GetICD9 to insert the ICD9 description for each ICD9 code /* -------------------------------------*/ Visit.SAS Language: SAS for Windows v6.12 In: Namcs.SD2 Out: Visits.sd2 Extract the records with the diagnosis codes desired from the NAMCS dataset. /* -------------------------------------*/ /* The DBData library is defined in SETUP.SAS*/ /* Set up a filename for the INCLUDE file that contains processing for the ICD9 Codes*/ FILENAME in1 W:\DB\ICD9Code.sas /*******************************/ /*********** PART I ************/ /*******************************/ /*Extract the 3 digit ICD9 codes*/ /*from all of the diagnoses */

DATA Data3dg (keep=diagnos weightpt recnum) SET DBData.namcs ATTRIB diagnos LENGTH=$3 LABEL= ICD-9 three-digit diagnosis code ARRAY dxs {*} $ diag3dg1-diag3dg3 /*Extract all 3 diagnoses */ DO i=1 TO DIM(dxs) IF dxs{i} ^= and SUBSTR(dxs{i},1,1) ^= V and SUBSTR(dxs{i},1,1) ^= E THEN DO recnum = _N_ diagnos = dxs{i} OUTPUT END END /*Eliminate the duplicate diagnoses for the same visit */ proc sort nodupkey BY recnum diagnos /*Extract the 4 digit ICD9 codes from all of the diagnoses */ DATA Data4dg (keep=diagnos4 weightpt recnum) SET DBData.namcs ATTRIB diagnos4 LENGTH=$4 LABEL= ICD-9 four-digit diagnosis code ARRAY dxs4 {*} $ diagnos1-diagnos3 /*Extract all 3 diagnoses */ DO i=1 TO DIM(dxs4) IF dxs4{i} ^= and SUBSTR(dxs4{i},1,1) ^= V and SUBSTR(dxs4{i},1,1) ^= E THEN DO recnum = _N_ diagnos4 = dxs4{i} OUTPUT END END /*Eliminate the duplicate diagnoses for the same visit */ proc sort nodupkey BY recnum diagnos4 /*Extract the 5 digit ICD9 codes from all of the diagnoses */ DATA Data5dg (keep=diagnos5 weightpt recnum) SET DBData.namcs ATTRIB diagnos5 LENGTH=$5 LABEL= ICD-9 five-digit diagnosis code ARRAY dxs5 {*} $ diag5dg1-diag5dg3 /*Extract all 3 diagnoses */ DO i=1 TO DIM(dxs5) IF dxs5{i} ^= and SUBSTR(dxs5{i},1,1) ^= V and SUBSTR(dxs5{i},1,1) ^= E THEN DO recnum = _N_ diagnos5 = dxs5{i} OUTPUT END END /*Eliminate the duplicate diagnoses for the same visit */ proc sort nodupkey BY recnum diagnos5 /*******************************/ /*********** PART II ***********/ /*******************************/ /*Run the procedure to capture only the desired ICD9 Codes using the data from the MS Word table entered in Step 4.*/ */ %INCLUDE in1 /*******************************/ /*********** PART III **********/ /*******************************/ /*Create a dataset that sums the weights by ICD9 category*/ PROC SQL CREATE table DBData.Visits as SELECT icd9cat, typedata, max(tdgcat) AS TDGGroup, max(groupnam) AS icd9desc, (SUM(weightpt))/1000 AS VB, COUNT(icd9cat) AS RelVB FROM DataICD9 GROUP BY icd9cat, typedata ORDER BY VB desc quit /*Calculate the totals and percentages for the entire dataset as well as totals and percentages by TDG */ %CalcTot(DBData.Visits,VB) /*Generate a table containing the total of the unreliable estimates */ PROC SQL CREATE TABLE DBData.VBU as SELECT sum(vb) as VB, sum(relvb) as RelVB, sum(pct) as Pct FROM DBData.Visits WHERE RelVB < 30 quit /*Generate a table containing the total of the unreliable estimates by TDG */ PROC SQL CREATE TABLE DBData.VBTDGU as SELECT max(tdggroup) as TDGGroup, sum(vb) as VB, sum(relvb) as RelVB, sum(tdgpct) as TDGPct FROM DBData.Visits WHERE RelVB < 30 GROUP BY TDGGroup quit

/*******************************/ /*********** PART IV ***********/ /*******************************/ /*Concatenate the ICD9 full code and description to each record in the data file */ %GetICD9(DBData.Visits,DBData.ICD9) The following is the include file code (ICD9Code.sas) used by the previous Visit.sas code and by all of the other NCHS programs to extract the desired ICD9 codes and code groups from the NCHS data. /* -------------------------------------*/ ICD9Code.SAS Language: SAS for Windows v6.12 Creates temporary files for the 3 digit, 4 digit and 5 digit codes desired from the Word table. +--------------------------------------*/ /* DBData library is defined in SETUP.SAS */ /*Make a table of all the 3 digit codes desired from the ICD9Table dataset.*/ DATA temp3 SET DBData.icd9tabl if numdigit = 3 /*Separate any groups of ICD9 codes into one record for each code */ data temp3 (keep=diagnos GroupAs GroupNam Type TDG) set temp3 attrib low length = 3 attrib range length = 3 attrib diagnos length = $3 low = icd9low range = icd9high - icd9low do i = 0 to range icd9low = low + i diagnos = left(trim(icd9low)) if length(diagnos) = 1 then diagnos = 00 diagnos if length(diagnos) = 2 then diagnos = 0 diagnos out /*Make a table of all the 4 digit codes desired from the ICD9Table dataset.*/ DATA temp4 SET DBData.icd9tabl if numdigit = 4 /*Separate any groups of ICD9 codes into one record for each code */ data temp4 (keep= diagnos4 GroupAs GroupNam Type TDG) set temp4 attrib low length = 4 attrib high length = 4 attrib range length = 4 attrib diagnos4 length = $4 low = in(left(trim(icd9low)),4.) high = in(left(trim(icd9high)),4.) range = high - low do i = 0 to range icd9low = (low + i,4.) diagnos4 = left(trim(icd9low)) if length(diagnos4) = 1 then diagnos4 = 000 diagnos4 if length(diagnos4) = 2 then diagnos4 = 00 diagnos4 if length(diagnos4) = 3 then diagnos4 = 0 diagnos4 out /*Make a table of all the 5 digit codes desired from the ICD9Table dataset.*/ DATA temp5 SET DBData.icd9tabl if numdigit = 5 /*Separate any groups of ICD9 codes into one record for each code */ data temp5 (keep=diagnos5 GroupAs GroupNam Type TDG) set temp5 attrib low length = 3 attrib range length = 3 attrib diagnos5 length = $5 low = icd9low range = icd9high - icd9low do i = 0 to range icd9low = low + i diagnos5 = left(trim(icd9low)) if length(diagnos5) = 1 then diagnos5 = 0000 diagnos5 if length(diagnos5) = 2 then diagnos5 = 000 diagnos5 if length(diagnos5) = 3 then diagnos5 = 00 diagnos5 if length(diagnos5) = 4 then diagnos5 = 0 diagnos5 out

/* Process the 3 digit code datasets */ /*Sort the 3 digit NCHS dataset by icd9cat */ proc sort data=data3dg by diagnos /*Sort the 3 digit ICD9Table dataset by icd9cat */ proc sort data=temp3 by diagnos /*Merge the NCHS diagnoses with the file containing the desired codes to get only the 3 digit codes and code groups desired*/ data Data3dg merge Data3dg(in=in1) temp3(in=in2) by diagnos if in1 and in2 then out /* Process the 4 digit code datasets */ /*Sort the 4 digit NCHS dataset by icd9cat */ proc sort data=data4dg by diagnos4 /*Sort the 4 digit ICD9Table dataset by icd9cat */ proc sort data=temp4 by diagnos4 /*Since some of the 4 digit diagnoses fields in the NCHS dataset only have 3 digits recorded, we have to make sure that those diagnoses are extracted correctly, so 3 digit versions of the 4 digit datasets are created*/ /*Make a dataset of the 4 digit NCHS diagnoses that only have 3 digits */ data Only3 set Data4dg if substr(diagnos4,4,1) = - then do diagnos4 = substr(diagnos4,1,3) out /*Make a dataset of the 4 digit ICD9Table codes desired, but keep only 3 digits*/ data tempset3 set temp4 diagnos4 = substr(diagnos4,1,3) /*Get rid of duplicates*/ proc sort nodupkey BY diagnos4 /*Merge the NCHS diagnoses with the file containing the desired codes to get only the 4 digit codes and code groups desired*/ data Data4dg merge Data4dg(in=in1) temp4(in=in2) by diagnos4 if in1 and in2 then out /*Merge the 3 digit versions of the 4 digit NCHS diagnoses with the file containing the desired codes to get only the 3 digit versions of the codes and code groups desired*/ data Data4dgt merge Only3(in=in1) tempset3(in=in2) by diagnos4 if in1 and in2 then out /* Process the 5 digit code datasets */ /*Sort the 5 digit NCHS dataset by icd9cat */ proc sort data=data5dg by diagnos5 /*Sort the 5 digit ICD9Table dataset by icd9cat */ proc sort data=temp5 by diagnos5 /*Merge the NCHS diagnoses with the file containing the desired codes to get only the 5 digit codes and code groups desired*/ data Data5dg merge Data5dg(in=in1) temp5(in=in2) by diagnos5 if in1 and in2 then out /*Create the final 3 digit code dataset */ DATA DataICD9 (drop=diagnos recnum TDG Type) SET Data3dg attrib icd9cat length=$20 label= ICD-9 Study Category attrib tdgcat length=$3 label= TDG Group attrib typedata length=$3 label=

Type - Referent or Study if GroupAs = then icd9cat = diagnos else icd9cat = GroupAs tdgcat = TDG if Type = R then typedata = RRR else if Type = P then typedata = PPP else typedata = SSS /*Create the final 4 digit code dataset */ DATA DataICDb (drop=diagnos4 diagnos recnum TDG Type) SET Data4dg attrib icd9cat length=$20 label= ICD-9 Study Category attrib tdgcat length=$3 label= TDG Group attrib typedata length=$3 label= Type - Reference or Study attrib diagnos length=$3 label= 3 digit diagnosis code /*Set diagnos to the 3 digit code for comparisons */ diagnos = substr(diagnos4,1,3) /*Remove the dash from the end of the 4 digit code*/ if substr(diagnos4,4,1) = - then diagnos4 = substr(diagnos4,1,3) if GroupAs = then icd9cat = diagnos4 else icd9cat = GroupAs tdgcat = TDG if Type = R then typedata = RRR else if Type = P then typedata = PPP else typedata = SSS /*Append the 4 digit codes selected to the 3 digit codes */ proc append base=dataicd9 data=dataicdb /*Create the final 3 digit version of */ /*the 4 digit code dataset */ DATA DataICDd (drop=diagnos4 recnum TDG Type) SET Data4dgt attrib icd9cat length=$20 label= ICD-9 Study Category attrib tdgcat length=$3 label= TDG Group attrib typedata length=$3 label= Type - Referent or Study if GroupAs = then icd9cat = diagnos4 else icd9cat = GroupAs tdgcat = TDG if Type = R then typedata = RRR else if Type = P then typedata = PPP else typedata = SSS /*Append the 4 digit codes selected to the 3 digit codes */ proc append base=dataicd9 data=dataicdd /*Create the final 5 digit code dataset */ DATA DataICDc (drop=diagnos4 diagnos5 recnum TDG Type) SET Data5dg attrib icd9cat length=$20 label= ICD-9 Study Category attrib tdgcat length=$3 label= TDG Group attrib typedata length=$3 label= Type - Reference or Study attrib diagnos4 length=$4 label= 4 digit diagnosis code /*Set diagnos4 to the 4 digit code for comparisons */ diagnos4 = substr(diagnos5,1,4) /*Remove the dash from the end of the 5 digit code*/ if substr(diagnos5,5,1) = - then diagnos5 = substr(diagnos5,1,4) if GroupAs = then icd9cat = diagnos5 else icd9cat = GroupAs tdgcat = TDG if Type = R then typedata = RRR else if Type = P then typedata = PPP else typedata = SSS /* Append the 5 digit codes selected to the 3 and 4 digit codes */ proc append base=dataicd9 data=dataicdc /*Delete temporary datasets*/ PROC DATASETS LIBRARY=WORK DELETE Data3dg Data4dg Data5dg DataICDb DataICDc STEP 7: RUN THE REPORT PROGRAMS The SAS programs that generate the Resource Utilization reports rely on macros that can be used for the Visit, Hospitalization and Nursing Home datasets. Prior to running the report programs, the print macros must be run so that they are available for use by the report printing programs.

These macros set up and print the different formatted reports. The sample code for these macros follows: /* --------------------------------------- PrintMacros.SAS Language: SAS for Windows v6.12 Contains all of the macros used for printing reports in the Resource Utilization Project +-------------------------------------*/ /*--------------------------------------- Macro Name: FixDesc Macro used in the Print programs to split the ICD9 description into separate lines of 45 characters each. +--------------------------------------*/ %MACRO FixDesc(intext,type) /*Break the ICD9 description up into shorter fields for printing*/ array desc{5} $ 45 tempdesc = &intext if &type = PPP then tempdesc = **Pain** tempdesc i = 1 do until (tempdesc = ) if length(tempdesc) > 45 then do desc{i} = substr(tempdesc,1,45) tempdesc = substr(tempdesc,46) /* Make sure words are not split across lines */ if substr(desc{i},45,1) ^= and substr(tempdesc,1,1) ^= then do j = 45 do until (substr(desc{i},j,1) = ) tempdesc = substr(desc{i},j,1) tempdesc desc{i} = substr (desc{i},1,length(desc{i})-1) j = j-1 else do desc{i} = tempdesc tempdesc = i = i + 1 %MEND FixDesc /* --------------------------------------- PrintAll.SAS In: Utilization dataset, Unreliable totals dataset Out: Word document Create a Word document with the desired Utilization data. +--------------------------------------*/ %MACRO PrintAll(RUData,UData,RUtil,Reliable) options nodate nonumber nocenter pagesize=1000 title /*Sort the dataset by descending Utilization */ proc sort data=&rudata by descending &RUtil /*Create the Word document, eliminating the unreliable records */ /*The VBU dataset contains the one line total for unreliable results*/ /*which will be printed as the last line in the report */ data _null_ file &dname /*dname is set by calling program*/ set &RUData(in=inRU) &UData /*If this is the first line, then print Total line*/ if _n_ = 1 then do / @1 Total @80 Tot COMMA10.2 @95 100.00% / /*Print the records from the reliable data*/ if inru then do if &Reliable >= 30 then do /* Don t print record if unreliable */ /*Split the ICD9 description into lines that are 50 chars long */ %FixDesc(icd9desc,typedata) @1 typedata $3. @5 icd9full $20. @30 desc{1} $45. @80 TDGGroup $1. @84 &RUtil COMMA10.2 @99 Pct PERCENT8.2 /*Print the rest of the ICD9 descriptions on separate lines */ if desc{2} ^= then @26 desc{2} if desc{3} ^= then @26 desc{3} if desc{4} ^= then @26 desc{4} if desc{5} ^= then @26 desc{5} /*Otherwise, this is the total unreliable line from the dataset.*/

/* print it with a description of what it represents */ else / @1 All remaining conditions for which reliable estimates could not be obtained @80 &RUtil COMMA10.2 @95 Pct PERCENT8.2 %MEND PrintAll /*---------------------------------------- PrintTDG.SAS In: Utilization dataset, Unreliable totals dataset Out: Word document Create a Word document with the desired Utilization data by TDG. +--------------------------------------*/ %MACRO PrintTDG(RUData,UData,RUtil,Reliable) options nodate nonumber nocenter pagesize=1000 linesize=256 /*Sort the dataset by TDG Group and descending Utilization */ proc sort data=&rudata by TDGGROUP descending &RUtil /*Sort the unreliable dataset by TDG Group*/ proc sort data=&udata by TDGGROUP /*Create the Word document*/ data _null_ file &dname print notitle /*dname is set by calling program*/ set &RUData by TDGGroup /*Print TDG Total line*/ if first.tdggroup then do @1 TDG= @5 TDGGroup $TDGfmt. @76 TDGTot COMMA10.2 @91 100.00% / if &Reliable >= 30 then do /* Don t print record if unreliable */ /*Split the ICD9 description into lines that are 50 chars long */ %FixDesc(icd9desc,typedata) /*Print the records*/ @1 typedata $3. @5 icd9full $20. @30 desc1 $45. @80 &RUtil COMMA10.2 @95 TDGPct PERCENT8.2 /*Print the rest of the ICD9 descriptions on separate lines */ if desc{2} ^= then @26 desc{2} if desc{3} ^= then @26 desc{3} if desc{4} ^= then @26 desc{4} if desc{5} ^= then @26 desc{5} /*This is the total unreliable line from the Unreliable dataset.*/ /* Print it with a description of what it represents */ if last.tdggroup then do set &UData / @1 All remaining conditions for which reliable estimates could not be obtained @76 &RUtil COMMA10.2 @91 TDGPct PERCENT8.2 _page_ %MEND PrintTDG /*-------------------------------------- PrintUnr.SAS In: Utilization dataset, Unreliable totals dataset Out: Word document Create a Word document with the unreliable Utilization data by TDG. +--------------------------------------*/ %MACRO PrintUnr(RUData,RUtil,Reliable) options nodate nonumber nocenter pagesize=1000 linesize=256 /*Sort the dataset by TDG Group and descending Utilization */ proc sort data=&rutil by TDGGROUP icd9cat /*Create the Word document*/ data _null_ file &dname print notitle /*dname is set by calling program*/ set &RUData by TDGGroup /*Print TDG line*/ if first.tdggroup then do @1 TDG= @5 TDGGroup $TDGfmt. if &Reliable < 30 then do /* Print record if unreliable */

/*Split the ICD9 description into lines that are 50 chars long */ %FixDesc(icd9desc,typedata) /*Print the records*/ @1 typedata $3. @5 icd9full $20. @30 desc1 $45. /*Print the rest of the ICD9 descriptions on separate lines */ if desc{2} ^= then @26 desc{2} if desc{3} ^= then @26 desc{3} if desc{4} ^= then @26 desc{4} if desc{5} ^= then @26 desc{5} if last.tdggroup then do _page_ %MEND PrintUnr /* ------------------------------------ PrintTot.SAS In: Utilization dataset Out: Word document Create a Word document with the summary totals and percentages +-------------------------------------*/ %MACRO PrintTot(RUData) options nodate nonumber nocenter pagesize=1000 linesize=256 title /*Sort the dataset by TDG */ proc sort data=&rudata by TDGGroup /*Get one record from each TDG (they all have the total for the TDG */ /*as well as the overall total) */ /*Calculate the percentage for each TDG of the overall total */ data temp(keep=tot tdgtot pct TDGGroup) set &RUData by TDGGroup if first.tdggroup then do pct = tdgtot/tot out /*Create the Word document, printing the grand total on the first line,*/ /*and the subtotals and percentages for each TDG next */ data _null_ file &dname /*dname is set by calling program*/ set temp end=last by TDGGroup /*If this is the first line, then print Total line*/ if _n_ = 1 then do / @1 Total @60 Tot COMMA10.2 @75 100.00% / @1 TDGGroup $TDGfmt. @60 tdgtot COMMA10.2 @75 pct PERCENT8.2 %MEND PrintTot The following program, PrintVB, prints four different Resource Utilization reports for the Physician Visits. The first report orders the Visit data from the largest utilization to the smallest. The next report does the same thing, but groups the ICD9 codes into their respective TDGs. The third report lists all of the unreliable estimates by ICD9 code. The final report is a summary of the Physician Visit data for each TDG as well as an overall total for the Visit Resource Utilization. Similar programs could be written to report on the Hospitalization, Nursing Home and Mortality data by changing the name of the in and out datasets. This code automatically generates a Word Document, which will be formatted in a later step. /* -------------------------------------*/ PrintVB.SAS Language: SAS for Windows v6.12 In: Visits.SD2, VBU.sd2, VBTDGU.sd2 Out: VB.doc VBTDG.doc VBTDGUnreliable.doc Create the Word documents with the desired Physician Visit data. +------------------------------------- */ /* The DBData library is defined in SETUP.SAS*/ /*Print the Visit Resource Utilization document by descending utilization*/ %let dname = w:\db\vb.doc %PrintAll(DBData.Visits,DBData.VBU,VB,RelV B) /*Print the Visit utilization by TDG and descending Visit utilization*/ %let dname = w:\db\vbtdg.doc

%PrintTDG(DBData.Visits,DBData.VBTDGU,VB,R elvb) /*Print the unreliable Visit estimates by TDG and descending Visit utilization*/ %let dname = w:\db\vbtdgunreliable.doc %PrintUnr(DBData.Visits,VB,RelVB) /*Print the totals and percentages for each TDG*/ %let dname = w:\db\vbsummary.doc %PrintTot(DBData.Visits) STEP 8: FORMAT THE REPORTS The report documents generated by the SAS programs in the previous step contain only the data for the report, without any formatting. For each report, there is a corresponding Word document skeleton that contains the formatting for that particular report, including report and column headings, footers and the correct margin settings. If a different format is desired, edit the skeleton document and make the changes. An example of the Word document skeleton file for the first Physician Visit report can be found in Figure 2. To create the final report for each of the report documents generated by the SAS programs: 1. Open the SAS generated Word document containing the data for the report. 2. Run the BoldFix macro to change the font and font size, bold certain ICD9 codes of interest and remove the column containing the Type of code field. The code for the BoldFix Word macro is shown below. 3. Open the corresponding Word document skeleton for that particular report and copy all of the data from the SAS generated report into the skeleton file. Save the file with a new name so that the skeleton file remains unchanged for later use. The report is now ready for publication. The following is the code for the MS Word macro BoldFix used in step 2 above: Sub BoldFix() BoldFix Macro Change Font and Font Size ActiveDocument.Select With Selection.Font.Name = "Courier New".Size = 8 End With Selection.ParagraphFormat.SpaceAfter = 3 Remove the RRR code and set line to bold For i = 1 To 50000 Selection.Find.ClearFormatting With Selection.Find.Text = "RRR".Replacement.Text = "".Forward = True.Wrap = wdfindstop.format = False.MatchCase = True.MatchWholeWord = True.MatchWildcards = False.MatchSoundsLike = False.MatchAllWordForms = False.Execute If.Found = True Then Selection.Delete Selection.Paragraphs(1).Range.Select With Selection.Font.Bold = True End With Selection.MoveDown Unit:=wdParagraph, Count:=1, Extend:=wdMove Else i = 50001 End If End With Next i Selection.GoTo What:=wdGoToLine, Which:=wdGoToAbsolute, Count:=1 Remove the SSS code For i = 1 To 50000 Selection.Find.ClearFormatting With Selection.Find.Text = "SSS".Replacement.Text = "".Forward = True.Wrap = wdfindstop.format = False.MatchCase = True.MatchWholeWord = True.MatchWildcards = False.MatchSoundsLike = False.MatchAllWordForms = False.Execute If.Found = True Then Selection.Delete Selection.MoveDown Unit:=wdParagraph, Count:=1, Extend:=wdMove Else i = 50001 End If End With Next i Selection.GoTo What:=wdGoToLine, Which:=wdGoToAbsolute, Count:=1 Remove the PPP code For i = 1 To 50000 Selection.Find.ClearFormatting With Selection.Find.Text = "PPP".Replacement.Text = "".Forward = True.Wrap = wdfindstop.format = False.MatchCase = True.MatchWholeWord = True.MatchWildcards = False.MatchSoundsLike = False.MatchAllWordForms = False.Execute If.Found = True Then Selection.Delete Selection.MoveDown Unit:=wdParagraph, Count:=1, Extend:=wdMove Else i = 50001 End If End With Next i

End Sub CONCLUSION With this process, we can provide resource utilization data for all diseases or disease categories in one document that can be easily updated with new NCHS data each year. The epidemiologist then has this data readily available and broken down into relevant therapeutic groups without having to engage a programmer to extract this information for every separate request. REFERENCES SAS Institute, Inc. (1990), SAS Language Reference, Version 6, First Edition, Cary, NC. SAS Institute, Inc. (1998), SAS Macro Language Reference, Cary, NC. Microsoft Word 97, Online Help. 1997 National Hospital Discharge Survey CD-Rom Series 13, 1997 National Ambulatory Medical Care Survey CD-Rom Series 13, and 1995 National Nursing Home Survey Cd- Rom Series 13, U.S. Department of Health and Human Serices Centers for Disease Control and Prevention National Center For Health Statistics. 1998 International Classification of Disease, 9 th Revision, Clinical Modification - Medicode ACKNOWLEDGMENTS We would like to thank Carlyne Averell, Anne Hickey and Susan Eaton for their support and assistance with this project. SAS is a registered trademark of SAS Institute, Inc. Other brand and product names are registered trademarks or trademarks of their respective companies. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Nancy Bourgeois Glaxo Wellcome, Inc. 5 Moore Drive Research Triangle Park, NC 27709 Fax: 919-315-8981 Email: neb22289@glaxowellcome.com Bobbie Coleman Glaxo Wellcome, Inc. 5 Moore Drive Research Triangle Park, NC 27709 Work Phone: 919-483-9212 Fax: 919-315-8981 Email: blc22701@glaxowellcome.com

Figure 1. Excerpt from the ICD9Table word document....

Figure 2. Skeleton Word document containing preset header and footer information...