Tabulating Patients, Admissions and Length-of-Stay By Dx Category, Fiscal Year, County and Age Group Step One: Extracting Data Use an array in a data step to search all the Dx Codes in one pass. The array creates a variable with n levels so you can search through a list of different variables with one command. data dataset1; set datalib.phrudata; length flag1 flag2 $ 1; Write extracted data to dataset1 Read data from datalib.phrudata Initialize flags for each relevant Dx Category flag1 = ; flag2 = ; array name{n} var1-varn; do i = 1 to n; if name{i} = dxcode1 then flag1= 1 ; if name{i} = dxcode2 then flag2= 1 ; if flag1 ne or flag2 ne ; Define name, number of elements and variables in the array Loop through each element in the array Search the array for a particular Dx Code and flag records where it appears If you are searching for multiple Dx Codes, use multiple flags to avoid overwriting flags for other relevant Dx Codes on the same record End loop Keep flagged records Keep in mind that although you are searching through n variables, they are all on the same record. Also note that the flag will only tell you that the record contains at least one occurrence of the relevant Dx Code, not how many times that Dx Code appears on the record. Step Two: Creating Categorical Variables Age Group: proc format; value agegrpf 1= 00-04 2= 05-09 18-high= 85+ ; Create a format for the age groups to be defined later The 18-high statement means any value from 18 up The format procedure must immediately follow the libname statements and come before any other procedures or data steps
age=int((sepdate-dob)/365.25); agegrp=int((age/5) + 1); format agegrp agegrpf.; Age is not a reliable variable, and should be calculated from the date of birth and separation date fields This is a shortcut for calculating 5-year age groups. The number you divide by determines the groupings Format the agegrp variable with the agegrpf. format. Note the required period at the end of the format Step Three: Summarizing the Data To count unique patients per fiscal year and calculate the mean number of admissions or LOS per patient, it is necessary to do some preliminary preparation of the data. Essentially, rather than a dataset of all admissions, you end up with a summarized dataset of one record per patient per year with cumulative admissions and length-of-stay. proc sort; by msi fy; data file2; set file1; by msi fy; length patients admits totlos 3; if first.msi or first.fy then do; admits=1; totlos=los; else do; admits+1; totlos+los; if last.msi or last.fy; patients=1; Sort the data by patient and fiscal year Use the BY statement to set up the FIRST. and LAST. function* Define numeric variables If patient s first admission in FY Initialize cumulative admissions counter Initialize total LOS to LOS of first admission If subsequent admission in FY Add 1 to cumulative admissions counter Add LOS to cumulative LOS Keep last summary record for each FY Count unique patients * The FIRST. and LAST. function creates a marker on the first and last occurrences of each unique value of a variable. MSI Marker FY Marker 01 first 89 first 01 89 01 89 last 01 90 first 01 last 90 last 02 first 89 first 02 90 last 02 last 91 first, last 03 first, last 89 first, last
Step Four: Tabulating the Data There are two options for tabulating and outputting the data: the means procedure and the tabulate procedure. The Means procedure is straightforward, but is most useful when you have a limited number of dimensions you want to present. The Tabulate is more useful when you are trying to present data in several dimensions (eg. by procedures by year, sex, age group and county). This procedure usually requires a little trial-and-error programming but can usually produce decent results. Proc Means: proc means sum mean; Select the statistics you want proc means to present title1 Title of Table ; title2 Title Subheading ; class fy sex agegrp; var patients admits totlos; Use the CLASS statement to select the categories by which you want the data presented Specify which variables you want in the output. Using the data preparation outlined earlier, this procedure will present the number of unique patients (N & ΣPATIENTS), total and mean admissions per patient (ΣADMITS, MEAN ADMISSIONS) and total and mean length-of-stay per patient (Σ LOS, MEAN LOS). Disregard the mean patients statistic as it has no useful meaning. Proc Tabulate: proc tabulate; title1 Title of Table ; title2 Title Subheading ; class fy county sex agegrp; var patients admits totlos; tables fy*(county all), Define your categories Define your variables The TABLES statement defines the presentation of your data. The commas separate the dimensions of your tables. With 2 dimensions, comma separates the vertical from the horizontal dimension. With 3 dimensions, the first comma separates pages and the second comma separates vertical and horizontal dimensions on the page. This line will generate a tabulation with one page per fiscal year per county with the last page of each fy tabulating all counties for that fy.
(sex all= M&F )*(age all= All Ages ), patients*sum*f=comma8. admits*(sum*f=comma8. mean*f=8.2) totlos*(sum*f=comma8. mean*f=8.2) / box=_page_ RTS=50; This line generates the vertical dimension of the table, with sex by age group and a summary of all ages for each age group and combined sex by age group. Sum of patients in 8 column comma format Sum and mean admissions Sum and mean of total LOS per patient Places page heading in the left-hand corner of the table Optional. This controls the number of columns allocated to the vertical category titles. It usually requires a little trial and error to get it to look right. Proc Tabulate gives you much more control over how you want the data presented. This is particularly useful when you are trying to condense data into the fewest pages.
The Full Program: /* Program to extract Hypertension admissions from CIHI95 */ options ls=90 ps=78 SASAUTOS= macro library ; libname library [project directory] ; libname datalib [data directory] ; proc format; value agegrp 1= 00-04 /* Create age group format */ 2= 05-09 18-high= 85+ ; data hyper95; /* Write extracted data to hyper95 */ set datalib.cihi95; /* Read data from CIHI95 */ length flag1 flag2 $ 1; flag = ; /* Initialize Flag */ array dxcodes{10} dxcode1-dxcode10; /* Define array as dxcode1-10 */ do i = 1 to 10; /* Repeat loop 16 times */ if dxcodes{i} =: 401 then flag= 1 ; /* Flag records with 401 code */ /* End loop */ if flag = 1 ; /* Keep flagged records */ {Repeat this datastep for each year you are interested in} data combined; /* Combine data from all years */ set hyper95 hyper94 ; /* Read all hyper datasets */ proc sort; /* Note, Procedures default to last written dataset */ by msi fy; /* Sort data by MSI and FY */ continued
data sumhyper; set combined; by msi fy; /* Create FIRST. and LAST. identifiers for MSI & FY */ length patients admits totlos 3; /* Define numeric variables */ if first.msi or first.fy then do; /* Summarize admissions and LOS by FY */ admits=1; totlos=los; else do; admits+1; totlos+los; if last.msi or last.fy; /* Keep summary record */ patients=1; /* Count patients */ proc means sum mean; /* Output data with proc means */ title1 Hypertension Patients, Admissions & Length-of-Stay ; title2 by Fiscal Year, County, Sex & Age Group ; class fy sex agegrp; var patients admits totlos; proc tabulate; /* Output data with proc tabulate */ title1 Hypertension Patients, Admissions & Length-of-Stay ; title2 by Fiscal Year, County, Sex & Age Group ; class fy county sex agegrp; var patients admits los; tables fy*(county all), /* Page */ (sex all= M&F )*(age all= All Ages ), /* Vertical Dimension */ patients*sum*f=comma8. /* Horizontal Dimension */ admits*sum*f=comma8. totlos*(sum*f=comma8. mean*f=8.2) / box=_page_ RTS=50;