3 Working with Administrative Databases: Tips and Tricks Canadian Institute for Health Information Emerging Issues Team Simon Tavasoli
Administrative Databases > Administrative databases are often used to synthesize information regarding health care system or to investigate health research questions > The data may be derived from population registries, vital statistics or other records of life events, or from health claims and services data > Canadian Institute for Health Information (CIHI), collect /receives essential data and prepares analyses on Canada s health system and the health of Canadians > Currently CIHI holds more than 27 databases with millions of Record (e.g. National Ambulatory Care Registry contains millions of records each year) 3
Working with Administrative Databases: General Tips and Tricks > Each day hundreds of employees conduct analyses using SAS > Given the magnitude of work load on the CIHI server, using resources wisely is important There is always a trade-off > Efficiency can be measured in many ways Real Time CPU time Memory Input /Output Original Programmer time Maintenance Programmer time 3
System Options for measure of performance > Options STIMER; (Default ) NOTE: DATA statement used: real time 1.16 seconds cpu time 0.09 seconds > Options FULLSTIMER; NOTE: The SAS System used: real time 0.14 seconds user cpu time 0.01 seconds system cpu time 0.05 seconds Memory 1452k Page Faults 1 Page Reclaims 2349 Page Swaps 0 Voluntary Context Switches 53 Involuntary Context Switches 5 Block Input Operations 1 Block Output Operations 0 4
Optimizing performance * Optimize performance by reducing CPU time -Check the program using the _null_ or the OBS -Use WHERE vs. IF -Use DROP and KEEP statements -Issues with merging data -Avoid unnecessary DATA steps or sorting -Manipulation of data with IF/THEN/ELSE statements -Dealing with resource intensive calculations *Keep the libraries clean *Reduce the size of the tables using COMPRESS=YES 5
When checking your programs, use a null data set or limit the number of observations 6
Subsetting Datasets: WHERE vs. IF statements 7
Process only the variables that you need Need only two variables Social Sciences computing cooperative 8
Subsetting datasets 9
Subsetting datasets: KEEP Statement 10
Subsetting datasets: KEEP Statement 11
Subsetting datasets: KEEP Statement 12
Some other Shortcuts 13
Merging data 14
Merging data 15
When only one condition can be true for a given observation, write a series of IF-THEN/ELSE statements. Social Sciences computing cooperative 16
When only one condition can be true for a given observation, write a series of IF-THEN/ELSE statements. 17
When only one condition can be true for a given observation, write a series of IF-THEN/ELSE statements. 18
Perform resource-intensive calculations and comparisons only once Social Sciences computing cooperative 19
Assign many values in one statement Social Sciences computing cooperative 20
Dealing with Missing Values Put missing values last in expressions Check for missing values before using a variable in multiple statements. Social Sciences computing cooperative 21
Avoid unnecessary sorting 22
If several different subsets are needed, avoid rereading the data for each subset 23
Keep your SAS environment clean 24
COMPRESS= 25