Seeking, Mapping, and Fuzzy Merging Data Structures in a Networked Environment Haftan Eckholdt, Albert Einstein College of Medicine, Bronx, NY

Size: px
Start display at page:

Download "Seeking, Mapping, and Fuzzy Merging Data Structures in a Networked Environment Haftan Eckholdt, Albert Einstein College of Medicine, Bronx, NY"

Transcription

1 Seeking, Mapping, and Fuzzy Merging Data Structures in a Networked Environment Haftan Eckholdt, Albert Einstein College of Medicine, Bronx, NY ABSTRACT The challenges of long term, multi-site projects include that acquisition, management, and analysis of data (files) distributed across many workstations from many points in time. Finding, reading and reconciling these files often involves data loss rates beyond tolerance for tracking and analysis in longitudinal / repeated measurement projects. SAS MACRO and the SAS X command can be used like a robot to search drives and directories across networks in order to seek and map file structures. The product of file mapping can then be used to read eligible files and fuzzy merge them using various spatio-temporal criteria to produce final (wide or deep) data structures linked with highest-to-lowest probability of match. Several examples of these processes will be demonstrated including: (1) the first part of this process to document hard drives across a small Novell / NT network, (2) the second part of this process to merge very large lists in a random digit dial computer aided telephone interview (RDD-CATI) laboratory, and (3) a combination of all of the above to openly seek, map, acquire, merge and analyze data in a laboratory of functional neuroanatomy/neurophysiology. INTRODUCTION Researchers today are challenged with managing rapidly growing multisited data structures, a trend that is likely to accelerate with internet and intranet evolution and usage. It is not unusual for analysts to manage and manipulate hundreds of thousands of files of varying formats residing in many known and unknown places. and review hundreds of pages of images, wave forms, topographic densities, or metabolic events, each representing one spatial perspective at one point in time in one condition from one unit of observation. A typical experiment or project often involves thousands of files, each with thousands of data points. It is not unusual for laboratories to collect a gigabyte of relevant data points on each subject or event. Nor is it unusual for the drive, directory (sub-sub-sub), and file naming schemes to have no symmetric logic, and no documentation. Some researchers have no idea where data have been stored and what they are named. This reality is not a sign of bad science, just bad technical management. And bad technical management means that files cannot be found for analysis. Lost data begins to look like bad science as many great hypotheses fall between the cracks of time and space along with their data. To date, there are no self-organizing software systems that can take the place of good technical management. So scientists are left to their own (de)vices. PROPOSED SOLUTIONS This manuscript describes the four steps of seeking data files, mapping data files, acquiring data, and merging data. These steps can be used in many applications but the examples will come from efforts to find, acquire, and merge data from the Einstein Aging Study, a longitudinal study of Alzhemier s Disease funded by the National Institute on Aging. SEEKING DATA FILES The first step, and perhaps only step, in seeking data files is to identify the hard drive(s) to be mapped. This is a relatively simple boot strap to the entire process, but it may require that the user map networked drives to the local workstation. The reader should note here that the user on the SAS processing workstation must have the privileges of a network administrator to do much of this work, and that the hard drives to be mapped have already be assigned share status, along with users and passwords. These procedures will not map non-local drives invasively without prior permissions and explicit privileges being established. The code below starts by turning off the X command feature with opens an Operating System window and waits for the user to exit the window before continuing to the next step. This is a critical start since hundreds of thousands of commands may be delivered to the local Operating System. Waiting for user input would turn an 8 hour job into an 8 month job. The following code then creates two SAS libraries (a temp library, and a long term library). This code then turns off the SAS log which will crash almost any operating system hard disk array with the files mapped and merged goes into the thousands. This code then cycles through a SAS MACRO procedure that names drive N to be mapped. OPTIONS PAGENO=1 NOXWAIT LIBNAME WORK C:\TEMP\SAS\ LIBNAME NESUG C:\NESUG\2000\DATA\ FILENAME NOLOG DUMMY PROC PRINTTO LOG=NOLOG %LET DRIVE = N The next step uses the SAS X command which sends the quoted string to the operating system. This code simulates opening a local operating system window and typing DIR N:\ > C:\NESUG\2000\DATA\DIR1.TXT This text uses the Microsoft DOS dir command to list the contents of the N: drive root directory. The most important feature of this command is the > sign which tell the operating system to dump the output of the dir command to the following file rather than the screen. In this case, the target file is call DIR1.txt, and it will consist of ASCII text and reside in the NESUG library on the local C drive. X "DIR L:\EAS\DATA\ARCHIVE > D:\AECOM\EAS\DATA\UNARCH\DIR1.TXT" The output file looks something like the following: Volume in drive N has no label. Volume Serial Number is 164A-18D6 Directory of N:\ 02/26/00 08:15a <DIR> FOUND /27/00 05:06p <DIR> I386 01/27/00 05:10p <DIR> SP3 01/27/00 05:23p <DIR> WINNT 01/27/00 05:21p <DIR> LRVR 06/28/00 01:46p 212,860,928 pagefile.sys 01/27/00 05:30p <DIR> Program Files 01/27/00 05:31p <DIR> TEMP 01/27/00 05:33p <DIR> WinUtils 01/27/00 05:34p <DIR> ToshUtil 01/27/00 05:34p <DIR> hddpwdnt 01/28/00 02:43p <DIR> Win32App 01/28/00 02:48p <DIR> SAS 01/28/00 03:21p <DIR> Multimedia Files 01/28/00 03:31p <DIR> lotus 01/28/00 04:18p <DIR> PASS60 02/18/00 07:45p <DIR> Iomg_NT 06/30/00 12:48p 0 DIR1.TXT 18 File(s) 212,860,928 bytes 88,768,512 bytes free MAPPING DATA FILES As one can see, the N drive is someone s C: drive since it contains all of the Operating System and software

2 directories that one would expect to find on the root. The following code will perform file mapping by reading the DIR output file as a data file. In this example, the SAS binary file will be called DIR1 and stored in the NESUG library. In Windows NT 4.0 workstation, the variables read in include file creation date, file creation time, including am/pm, the directory descriptor whether or not the object is a file or directory, and finally, the file or directory name. None of these data appear until the 8 th line, so the FIRSTOBS option must be used to skip over the file header garbage. DATA NESUG.DIR1 INFILE "D:\AECOM\EAS\DATA\UNARCH\DIR1.TXT" FIRSTOBS = 8 FDATE MMDDYY8. FTIME TIME7. AMPM $ FDIR $ FNAME $ IF FDIR NE "<DIR>" THEN DELETE PROC SORT BY FDIR DATA _NULL_ CALL SYMPUT ("SAMPLE",COUNT) STOP SET EAS.DIR1 NOBS=COUNT run Prior to using this code, which was written and optimized to run on a Microsoft Windows NT 4.0 workstation (SP3), the user must submit the quoted X command string and verify the structure of the DIR output file. The next lines in the code the perform the special task of acquiring the directories only, and creating a temporary SAS data file that only contains the directory tagged observations from the DIR output file. The number of directories in the DIR output file is also saved as a variable called sample to be used in the next set of MACRO procedures call Look. This macro loops around once for each observation (directory) in the DIR output file (ASCII version called DIR1.TXT and SAS version call DIR1.SD2) using the %DO loop until the %END at the bottom. For each directory on N:, a variable is created call SDIR which is space trimmed version of FNAME. This will allow the MACRO LOOK to then perform the same DIR X command on each and every directory in DIR1.SD2 which will be called DIR2.TXT and DIR2.SD2 respectively. %MACRO LOOK %DO L = 1 %TO &SAMPLE DATA EAS.DIR1 SET EAS.DIR1 IF _N_ = &L THEN CALL SYMPUT ("SDIR",TRIM(FNAME)) RUN X "DIR N:\&SDIR > C:\NESUG\2000\DATA\DIR2.TXT" DATA EAS.DIR2 INFILE "C:\NESUG\2000\DATA\DIR2.TXT" FIRSTOBS = 8 FDATE2 MMDDYY8. FTIME2 TIME7. AMPM2 $ FDIR2 $ FNAME2 $ DIR1 = "&SDIR" IF FDIR2 NE "<DIR>" THEN DELETE PROC SORT BY FNAME2 DATA EAS.DIR2 SET EAS.DIR2 BY FNAME2 CALL SYMPUT ("SDIR2",PUT(FNAME2,Z8.)) RUN At this point, the reader should understand that this process can continue infinitely until there are not more subdirectories, with minor edits to the macro syntax and logic. A few additional procedures can then be used to bring about a single file structure that contains all of the files, and all of their directory tree information. Note that the first time around, a file must be created rather than appended (%* commented out below). PROC TRANSPOSE DATA=EAS.DIR2 OUT=EAS.DIR2H NAME=File PREFIX=FILE VAR FNAME %*DATA NESUG.ARCHIVE data NESUG.dir2h ATTRIB DIR1 LENGTH=$9 attrib DIR2 LENGTH=$8 SET NESUG.DIR2H DIR1 = "&DIR1" DIR2 = "&DIR2" PROC APPEND BASE=NESUG.ARCHIVE DATA=NESUG.DIR2H FORCE %END %MEND LOOK %LOOK In this last step, a frequency distribution is established about the number of directories and files. DATA NESUG.ARCHIVE SET NESUG.ARCHIVE NFILE = 0 IF FILE1 NE " " THEN NFILE = NFILE +1 IF FILE2 NE " " THEN NFILE = NFILE +1 IF FILE3 NE " " THEN NFILE = NFILE +1 IF FILE4 NE " " THEN NFILE = NFILE +1 IF FILE5 NE " " THEN NFILE = NFILE +1 IF FILE6 NE " " THEN NFILE = NFILE +1 IF FILE7 NE " " THEN NFILE = NFILE +1 IF FILE8 NE " " THEN NFILE = NFILE +1 IF FILE9 NE " " THEN NFILE = NFILE +1 IF FILE10 NE " " THEN NFILE = NFILE +1 PROC MEANS N MIN MAX MEAN SUM TITLE2 "ARCHIVE READING FOR NESUG 2000" VAR NFILE PROC FREQ TABLES NFILE RUN A next step in file mapping might also involve a summary of

3 all file names and file types. This will help in deciding which kinds of files will be acquired next. %macro LOOK2 %DO A = 1 %TO 10 DATA FILE&A SET NESUG.ARCHIVE FILE = FILE&A DOT = INDEX(FILE,".") FILEN = SUBSTR(FILE,1,DOT-1) IF DOT > 0 THEN FILET = SUBSTR(FILE,DOT+1) KEEP FILE DIR1 DIR2 FILE FILEN FILET %END DATA NESUG.ARCHFILE SET %DO B = 1 %TO 10 FILE&B %END %MEND LOOK2 %LOOK2 PROC SORT BY DIR1 DIR2 FILET PROC FREQ TABLES DIR1 DIR2 PROC FREQ TABLES FILET BY DIR1 RUN ACQUIRING DATA It is now necessary to clear the SAS temp libraries to prevent any rereading of data structures. A selected group of directories and file types will now be read. Accordingly, the reader must know the format and syntax of the files to be read. In this example, a series of vertical data files will be read in from the SAS directory. These files will then be read into a single vertical SAS data file for analysis. PROC DATASETS LIBRARY=WORK KILL DATA NESUG.FLA SET NESUG.ARCHFILE IF DIR1 = "SAS" DIR2 = WORK FILET = "FL" PROC SORT NODUPKEY BY FILEN FILET RUN DATA _NULL_ CALL SYMPUT ("SAMPLE",COUNT) STOP SET NESUG.FLA NOBS=COUNT run %MACRO LOOK3 %DO F = 1 %TO &SAMPLE DATA NESUG.FLA SET NESUG.FLA IF _N_ = &F THEN CALL SYMPUT ("DIR1",TRIM(DIR1)) IF _N_ = &F THEN CALL SYMPUT ("DIR2",TRIM(DIR2)) IF _N_ = &F THEN CALL SYMPUT ("FILE",TRIM(FILEN)) DATA FLA&F INFILE "N:\&DIR1\&DIR2\LEARN\&FILE..FL" DLM=',' DSD recfm=v LRECL=100 Last : $CHAR15. First : $CHAR15. / ID / Age / YrWave : $CHAR3. / FCSR : $CHAR12. / Date : $CHAR8. / FL0701 / FL0801 / CT01 $ TR01 $ FL0103 FL0104 / RC0101 XT0101 : $CHAR12. XT0102 : $CHAR12. XT0103 : $CHAR12. RT0104 / CT02 $ TR02 $ FL0203 FL0204 / ETC ARCHID = &DIR1 ARCHWA = &DIR2 ARCHFI = "&FILE" %END DATA EAS.FLA SET %DO F = 1 %TO &SAMPLE FLA&F %END %MEND LOOK3 %LOOK3 Further along in this process, some variables need to be combined for modeling. %MACRO LOOK4 %LET VAR = %LET DIG = %DO V = 1 %TO 16 %DO D = 1 %TO 3 S%SCAN(&VAR,&V)%SCAN(&DIG,&D) = INDEX(XT%SCAN(&VAR,&V)%SCAN(&DIG,&D)," ") RT%SCAN(&VAR,&V)%SCAN(&DIG,&D) = SUBSTR(XT%SCAN(&VAR,&V)%SCAN(&DIG,&D),1, S%SCAN(&VAR,&V)%SCAN(&DIG,&D)-1) RC%SCAN(&VAR,&V)%SCAN(&DIG,&D) = SUBSTR(XT%SCAN(&VAR,&V)%SCAN(&DIG,&D), S%SCAN(&VAR,&V)%SCAN(&DIG,&D)+1) %END %END %MEND LOOK4 %LOOK4 YEAR = SUBSTR(YRWAVE,1,2) WAVE = SUBSTR(YRWAVE,3,1)

4 MM = SUBSTR(DATE,1,2) DD = SUBSTR(DATE,4,2) YY = SUBSTR(DATE,7,2) EVENT = 1 IF ID = LAG1(ID) YRWAVE=LAG1(YRWAVE) THEN EVENT = 2 PROC FORMAT VALUE EV 1 = "Instant" 2 = "Delay" PROC PRINT TITLE3 "FL A SET Data Acquisition" PROC FREQ TABLES EVENT FORMAT EVENT EV. RUN MERGING DATA Finally, we reach the process of fuzzy merging. In this example, fuzzy merging comes from a different part of the study the development of the CATI RDD phone sample. In this example, the source file contains the name and address of a potential subject, while a separate file contains the name, address, and phone number of all people in the white pages directory for the same county. The essential logic of the system involves the separation of source names into character string groups based on the length of the source. That way, a short last name of 4 characters is only merged with potential targets that are of matching length. This initial process greatly decreases total CPU time (this code ran in serial process for approximately 60 days) and increases accuracy (a number was found for about 50% of the sources an accuracy in excess of 99%). LIBNAME EAS LIBNAME TEMP D:\AECOM\EAS\HCFA98\DATA\ F:\TEMP\ FILENAME NOLOG DUMMY PROC PRINTTO LOG=NOLOG PRINT= F:\TEMP\sample.LST ** READ IN PHONE BOOK ASCII DATA FILE DATA EAS.PROCD2B INFILE D:\AECOM\EAS\HCFA98\DATA\PROCD2B.PRN LRECL=74 PAD NAMEP $ 1-20 ADDPH $ ZIPP PHONE $ * SET MARKER PROCD = 1 * GRAB FIRST WORD AS LAST NAME SECOND WORD FIRST NAME LAST=SCAN(NAMEP,1) FIRPH=SCAN(NAMEP,2) SECPH=SCAN(NAMEP,3) * FIND LENGTH OF EACH NAME CUTLPH=LENGTH(LAST) CUTFPH=LENGTH(FIRPH) LABEL ADDPH = Phone Address NAMEP = Phone Name ** READ IN HCFA ASCII DATA FILE DATA EAS.HCFA98 INFILE D:\AECOM\EAS\HCFA98\DATA\FOREIGN.TXT LRECL=205 PAD NAMEH $ ADDHC $ BDYY BDDDD DDYY DDDDD ZIPH SEX 187 RACE 188 * SET MARKER HCFA=1 * GRAB FIRST WORD AS LAST NAME SECOND WORD FIRST NAME LAST=SCAN(NAMEH,1) FIRHC=SCAN(NAMEH,2) SECHC=SCAN(NAMEH,3) * FIND LENGTH OF EACH NAME CUTLHC=LENGTH(LAST) CUTFHC=LENGTH(FIRHC) AGE = 98-BDYY + ((365 - BDDDD)/365) IF AGE < 55 THEN AGE = AGE AGE80_01=1 IF AGE < 80 THEN AGE80_01 = 0 DEATH = 0 IF DDYY >. THEN DEATH = 1 RACE2 = RACE IF RACE2 = 0 THEN RACE2 = 3 IF RACE2 > 2 THEN RACE2 = 3 PROC FORMAT VALUE DEA 0= Alive 1= Dead VALUE AGE 0= <80 1= 80+ VALUE SEX 1= Male 2= Female VALUE RAC 1= White 2= Black 3= Other VALUE MAT 0= No Match 1= Match LABEL AGE80_01 = Age Strata ADDHC = HCFA Address NAMEH = HCFA Name RACE2 = Race PROC CONTENTS TITLE3 "HCFA LIST DESCRIPTION" PROC FREQ TABLES DEATH TABLES RACE2 TABLES AGE80_01 TABLES SEX FORMAT DEATH DEA. RACE2 RAC. AGE80_01 AGE. SEX SEX. PROC UNIVARIATE PLOT NORMAL VAR AGE *** BEGIN CUTTING PHONE DATA * NOTE THERE ARE LAST NAMES WITH 1 TO 17 CHARACTERS %MACRO PHONCUT

5 %DO P = 1 %TO 17 %* MAKE DATA SET FOR EACH LAST NAME LENGTH / FIRST NAME DATA EAS.PHONB&P SET EAS.PROCD2B %END %MEND PHONCUT IF CUTLPH = &P ** BEGIN CUTTING HCFA DATA * NOTE THERE ARE LAST NAMES WITH 1 TO 19 CHARACTERS %MACRO HCFACUT %DO H = 1 %TO 19 %* MAKE DATA SET FOR EACH LAST NAME LENGTH / FIRST NAME DATA EAS.HCFAB&H SET EAS.HCFA98 ** RUN MACROS %PHONCUT %HCFACUT %END %MEND HCFACUT IF CUTLHC = &H Note here that at this point in the process, the data structures (source and target candidates) are completed and the process of repeated merging begins. In order to maximize successful merging, each source observation must be merged with all same length targets. This is very CPU intensive (for more than 100,000 sources)! Note too that this process is initially conducted on the very first source candidate to establish the files, the looped through the remaining source candidates using the APPEND statement. ***** BEGIN BANGING THE FULL NAME DATA STRUCTURES * CLEAR THE WORKING BASE APPENDING DATASET data temp.hcphbase set %MACRO SAMPLE %DO B = 1 %TO 17 data _null_ set eas.hcfab&b call symput( last,trim(left(_n_))) %* CALL PHONE HCFA INTO BEING DATA PH SET EAS.PHONB&B PROC SORT BY LAST DATA HC SET EAS.HCFAB&B %* GRAB FIRST HCFA OBS IF _N_ = 1 stack=1 %* MERGE FIRST HCFA OBS WITH PHONE DATA SET MATCH = 1 DATA HCPH MERGE HC PH BY LAST DATA temp.hcphb&b SET HCPH MATCH = 0 IF HCFA = 1 PROCD = 1 THEN IF MATCH=1 OR MATCH=0 HCFA = 1 This part of the code develops a measure of match accuracy based on the number of characters in the name that match in the source and the target, as well as the number of characters in the address that match in the source and the target. Each source observation then has a dataset of target matched candidates that is sorted for saving the highest match. %* RUN THE MATCH COUNTING [DIS_TANCE] SEQUENCE DISNAM = 0 substr(firph,1,1) THEN disnam = disnam + 1 substr(firph,2,1) THEN disnam = disnam + 1 substr(firph,2,1) substr(firph,3,1) THEN disnam = disnam + 1 substr(firph,2,1) substr(firph,3,1) substr(firhc,4,1) = substr(firph,4,1) THEN disnam = disnam + 1 substr(firph,2,1) substr(firph,3,1) substr(firhc,4,1) = substr(firph,4,1) substr(firhc,5,1) = substr(firph,5,1) THEN disnam = disnam + 1 DISNAM2 = 0 substr(secph,1,1) THEN disnam2= disnam2+ 1 substr(secph,2,1) THEN disnam2= disnam2+ 1 substr(secph,2,1) substr(secph,3,1) THEN disnam2= disnam2+ 1 substr(secph,2,1) substr(secph,3,1) substr(sechc,4,1) = substr(secph,4,1) THEN disnam2= disnam2+ 1 substr(secph,2,1) substr(secph,3,1) substr(sechc,4,1) = substr(secph,4,1)

6 substr(sechc,5,1) = substr(secph,5,1) THEN disnam2= disnam2+ 1 DISADD = 0 THEN disadd = disadd + 1 THEN disadd = disadd + 1 substr(addhc,3,1) = substr(addph,3,1) THEN disadd = disadd + 1 substr(addhc,3,1) = substr(addph,3,1) substr(addhc,4,1) = substr(addph,4,1) THEN disadd = disadd + 1 substr(addhc,3,1) = substr(addph,3,1) substr(addhc,4,1) = substr(addph,4,1) substr(addhc,5,1) = substr(addph,5,1) THEN disadd = disadd + 1 proc sort by last disadd disnam disnam2 data temp.hcphb&b set temp.hcphb&b by last disadd disnam disnam2 if last.last %* SLOPE NOW INTO THE REMAINING OBS OF THE HCFA DATASET %* BASE DATASET IS HCFA THE ADDED DATA COMES FROM PROCD %* CREATE COMMON MERGE NAME / ADDRESS VARS IN EACH DATASET OF LOW TO HIGH MATCH = 1 and %MACRO BANGER %DO C = 2 %TO &LAST DATA HC SET EAS.HCFAB&B IF _N_ = &C stack=&c DATA HCPH MERGE HC PH BY LAST DATA temp.hcphbas SET HCPH IF MATCH = 0 IF HCFA = 1 PROCD = 1 THEN MATCH=1 or MATCH=0 HCFA=1 DISNAM = 0 substr(firph,1,1) THEN disnam = disnam + 1 substr(firph,1,1) substr(firph,2,1) THEN disnam = disnam + 1 substr(firph,1,1) and substr(firph,2,1) substr(firph,3,1) THEN disnam = disnam + 1 substr(firph,2,1) substr(firph,3,1) substr(firhc,4,1) = substr(firph,4,1) THEN disnam = disnam + 1 substr(firph,2,1) substr(firph,3,1) substr(firhc,4,1) = substr(firph,4,1) substr(firhc,5,1) = substr(firph,5,1) THEN disnam = disnam + 1 DISNAM2 = 0 substr(secph,1,1) THEN disnam2= disnam2+ 1 substr(secph,2,1) THEN disnam2= disnam2+ 1 substr(secph,2,1) substr(secph,3,1) THEN disnam2= disnam2+ 1 substr(secph,2,1) substr(secph,3,1) substr(sechc,4,1) = substr(secph,4,1) THEN disnam2= disnam2+ 1 substr(secph,2,1) substr(secph,3,1) substr(sechc,4,1) = substr(secph,4,1) substr(sechc,5,1) = substr(secph,5,1) THEN disnam2= disnam2+ 1 DISADD = 0 IF substr(addhc,1,1) = substr(addph,1,1) THEN disadd = disadd + 1 IF substr(addhc,1,1) = substr(addph,1,1) substr(addhc,2,1) = substr(addph,2,1) THEN disadd = disadd + 1 IF substr(addhc,1,1) = substr(addph,1,1) substr(addhc,2,1) = substr(addph,2,1) substr(addhc,3,1) = substr(addph,3,1) THEN disadd = disadd + 1 IF substr(addhc,1,1) = substr(addph,1,1) substr(addhc,2,1) = substr(addph,2,1) substr(addhc,3,1) =

7 substr(addph,3,1) substr(addhc,4,1) = substr(addph,4,1) THEN disadd = disadd + 1 substr(addhc,3,1) = substr(addph,3,1) substr(addhc,4,1) = substr(addph,4,1) substr(addhc,5,1) = substr(addph,5,1) THEN disadd = disadd + 1 proc sort by last disadd disnam disnam2 data temp.add set temp.hcphbas by last disadd disnam disnam2 if last.last PROC APPEND BASE = temp.hcphb&b DATA = temp.add %end %mend banger %banger %END %MEND SAMPLE %SAMPLE EVALUATION OF THE METHODS At this point the reader can see, if you are still with me, that a set of descriptive, inferential, and graphical representations can be run to assess the nature of merging that occurred. DATA EAS.SAMPLE2 SET TEMP.HCPHB1 TEMP.HCPHB2 TEMP.HCPHB3 TEMP.HCPHB4 TEMP.HCPHB5 TEMP.HCPHB6 TEMP.HCPHB7 TEMP.HCPHB8 TEMP.HCPHB9 TEMP.HCPHB10 TEMP.HCPHB11 TEMP.HCPHB13 TEMP.HCPHB14 TEMP.HCPHB15 TEMP.HCPHB16 TEMP.HCPHB17 intage = int(age) CHANCE = 0 IF MATCH = 1 THEN DO CHANCE = 0.5 * ((DISADD/CUTLHC) + (DISNAM+DISNAM2)) END PROC SORT BY CUTLHC DESCENDING CHANCE PROC FREQ TITLE3 "FIRST HCFA - PHONEBOOK MERGE" TABLES MATCH TABLES MATCH*DISADD*DISNAM TABLES AGE80_01*MATCH / EXPECTED CHISQ TABLES SEX*MATCH / EXPECTED CHISQ TABLES RACE2*MATCH / EXPECTED CHISQ format match mat. age80_01 age. SEX SEX. RACE2 RAC. PROC TTEST CLASS MATCH VAR AGE format match mat. age80_01 age. SEX SEX. RACE2 RAC. PROC TTEST CLASS SEX VAR CHANCE format match mat. age80_01 age. SEX SEX. RACE2 RAC. PROC TTEST CLASS AGE80_01 VAR CHANCE format match mat. age80_01 age. SEX SEX. RACE2 RAC. PROC UNIVARIATE PLOT NORMAL VAR CHANCE AGE proc print VAR CHANCE DISADD DISNAM DISNAM2 NAMEH NAMEP ADDHC ADDPH PHONE BY CUTLHC RUN *LIBNAME EAS E:\DATA\EAS\HCFA98\DATA\ *LIBNAME TEMP E:\TEMP\ LIBNAME EAS LIBNAME TEMP D:\AECOM\EAS\HCFA98\DATA\ F:\TEMP\ *GOPTIONS DEVICE=WIN GACCESS= SASGASTD>LPT1: ROTATE=LSCAPE INTERPOL=JOIN GOPTIONS DEVICE=HPLJ4SI GACCESS= SASGASTD>LPT1: ROTATE=LSCAPE INTERPOL=JOIN DATA EAS.SAMPLE2 SET EAS.SAMPLE2 PROC FORMAT VALUE DEA 0= Alive 1= Dead VALUE AGE 0= <80 1= 80+ VALUE SEX 1= Male 2= Female VALUE RAC 1= White 2= Black 3= Other VALUE MAT 0= No Match 1= Match PROC FREQ TITLE2 Einstein Aging Study for Richard Lipton TITLE3 HCFA-Phone Age Distribution TABLES INTAGE PROC univariate plot TITLE2 Einstein Aging Study for Richard Lipton TITLE3 HCFA-Phone Merge Confidence var chance pattern1 value=solid axis1 color=blue width=2.0 axis2 color=blue width=2.0

8 axis3 color=blue width=2.0 proc gchart TITLE2 Einstein Aging Study for Richard Lipton TITLE3 HCFA-Phone Overall Age Distribution vbar INTAGE / maxis=axis1 raxis=axis2 DISCRETE type=freq WHERE INTAGE > 65 proc gchart TITLE2 Einstein Aging Study for Richard Lipton TITLE3 HCFA-Phone Age Density by Sex vbar INTAGE / maxis=axis1 raxis=axis2 DISCRETE type=pct by SEX format sex sex. where intage > 65 symbol1 c=default proc gplot TITLE2 Einstein Aging Study for Richard Lipton TITLE3 HCFA-Phone Merge Confidence plot CHANCE * INTAGE WHERE INTAGE > 65 symbol1 c=default symbol2 c=default l=34 proc gplot TITLE2 Einstein Aging Study for Richard Lipton TITLE3 HCFA-Phone Merge Confidence for each Sex plot CHANCE * INTAGE = SEX FORMAT SEX SEX. WHERE INTAGE > 65 CONCLUSION Animation presents itself as a powerful and necessary TRADEMARKS SAS and SAS/MACRO are trademarks of the SAS Institute Inc in the USA and other countries. indicates USA registration. REFERENCES Eckholdt, H. (1999). Visual Analysis on the WEB: Animating High Density Multidimensional Data. In M. Zdeb (Ed.) Proceedings of the North East SAS Users Group Washington, D.C., Eckholdt, H., Brown, L., Smith, D. and Feldman, S. (1998). Revealing Structure-Function patterns in the Basal Ganglia: Animating autoradiographic maps. Paper presented at the Society for Neuroscience. November 1998: Los Angeles, California. SAS. (1998). SAS for Windows version [computer software] Cary, North Carolina: SAS Institute, Inc. inquiries (919) ACKNOWLEDGMENTS The following researchers (great scientists, terrible technical managers) have provided me with data and inspiration that serve as the basis of this presentation: Lucy Brown, PhD, Herman Buschke, MD, Richard Lipton, MD, MPH. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Haftan Eckholdt, PhD, MS Albert Einstein College of Medicine Department of Neurology Kennedy Center Pelham Parkway South Bronx, New York Work Phone: (718) Fax: (718) eckholdt@aecom.yu.edu symbol1 c=default L=1 symbol2 c=default l=15 symbol3 c=default l=25 proc gplot TITLE2 Einstein Aging Study for Richard Lipton TITLE3 HCFA-Phone Merge Confidence for each Race plot CHANCE * INTAGE = RACE2 WHERE INTAGE > 65 FORMAT RACE2 RAC. run quit RUN

3. Almost always use system options options compress =yes nocenter; /* mostly use */ options ps=9999 ls=200;

3. Almost always use system options options compress =yes nocenter; /* mostly use */ options ps=9999 ls=200; Randy s SAS hints, updated Feb 6, 2014 1. Always begin your programs with internal documentation. * ***************** * Program =test1, Randy Ellis, first version: March 8, 2013 ***************; 2. Don

More information

Quality Control of Clinical Data Listings with Proc Compare

Quality Control of Clinical Data Listings with Proc Compare ABSTRACT Quality Control of Clinical Data Listings with Proc Compare Robert Bikwemu, Pharmapace, Inc., San Diego, CA Nicole Wallstedt, Pharmapace, Inc., San Diego, CA Checking clinical data listings with

More information

An Automation Procedure for Oracle Data Extraction and Insertion

An Automation Procedure for Oracle Data Extraction and Insertion An Automation Procedure for Oracle Data Extraction and Insertion Shiqun S. Li, Smith Hanley, East Hanover, NJ David H. Wilcox, NYS Department of Health, Albany, NY ABSTRACT SAS software provides strong

More information

EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT)

EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT) EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT) DESCRIPTION: This example shows how to combine the data on respondents from the first two waves of Understanding Society into

More information

Contents of SAS Programming Techniques

Contents of SAS Programming Techniques Contents of SAS Programming Techniques Chapter 1 About SAS 1.1 Introduction 1.1.1 SAS modules 1.1.2 SAS module classification 1.1.3 SAS features 1.1.4 Three levels of SAS techniques 1.1.5 Chapter goal

More information

2. Don t forget semicolons and RUN statements The two most common programming errors.

2. Don t forget semicolons and RUN statements The two most common programming errors. Randy s SAS hints March 7, 2013 1. Always begin your programs with internal documentation. * ***************** * Program =test1, Randy Ellis, March 8, 2013 ***************; 2. Don t forget semicolons and

More information

Procedures. PROC CATALOG CATALOG=<libref.>catalog <ENTRYTYPE=etype> <KILL>; CONTENTS <OUT=SAS-data-set> <FILE=fileref;>

Procedures. PROC CATALOG CATALOG=<libref.>catalog <ENTRYTYPE=etype> <KILL>; CONTENTS <OUT=SAS-data-set> <FILE=fileref;> 355 CHAPTER 19 Procedures SAS Procedures under Windows 355 CATALOG 355 CIMPORT 356 CONTENTS 357 CONVERT 358 CPORT 361 DATASETS 361 OPTIONS 363 PMENU 364 PRINTTO 365 SORT 367 SAS Procedures under Windows

More information

A Practical Introduction to SAS Data Integration Studio

A Practical Introduction to SAS Data Integration Studio ABSTRACT A Practical Introduction to SAS Data Integration Studio Erik Larsen, Independent Consultant, Charleston, SC Frank Ferriola, Financial Risk Group, Cary, NC A useful and often overlooked tool which

More information

Using MACRO and SAS/GRAPH to Efficiently Assess Distributions. Paul Walker, Capital One

Using MACRO and SAS/GRAPH to Efficiently Assess Distributions. Paul Walker, Capital One Using MACRO and SAS/GRAPH to Efficiently Assess Distributions Paul Walker, Capital One INTRODUCTION A common task in data analysis is assessing the distribution of variables by means of univariate statistics,

More information

%Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables

%Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables %Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables Rich Schiefelbein, PRA International, Lenexa, KS ABSTRACT It is often useful

More information

Chapter 6: Modifying and Combining Data Sets

Chapter 6: Modifying and Combining Data Sets Chapter 6: Modifying and Combining Data Sets The SET statement is a powerful statement in the DATA step. Its main use is to read in a previously created SAS data set which can be modified and saved as

More information

Tips to Customize SAS/GRAPH... for Reluctant Beginners et al. Claudine Lougee, Dualenic, LLC, Glen Allen, VA

Tips to Customize SAS/GRAPH... for Reluctant Beginners et al. Claudine Lougee, Dualenic, LLC, Glen Allen, VA Paper SIB-109 Tips to Customize SAS/GRAPH... for Reluctant Beginners et al. Claudine Lougee, Dualenic, LLC, Glen Allen, VA ABSTRACT SAS graphs do not have to be difficult or created by SAS/GRAPH experts.

More information

Using DDE with Microsoft Excel and SAS to Collect Data from Hundreds of Users

Using DDE with Microsoft Excel and SAS to Collect Data from Hundreds of Users Using DDE with Microsoft Excel and SAS to Collect Data from Hundreds of Users Russell Denslow and Yan Li Sodexho Marriott Services, Orlando, FL ABSTRACT A process is demonstrated in this paper to automatically

More information

A Side of Hash for You To Dig Into

A Side of Hash for You To Dig Into A Side of Hash for You To Dig Into Shan Ali Rasul, Indigo Books & Music Inc, Toronto, Ontario, Canada. ABSTRACT Within the realm of Customer Relationship Management (CRM) there is always a need for segmenting

More information

Creating Population Tree Charts (Using SAS/GRAPH Software) Robert E. Allison, Jr. and Dr. Moon W. Suh College of Textiles, N. C.

Creating Population Tree Charts (Using SAS/GRAPH Software) Robert E. Allison, Jr. and Dr. Moon W. Suh College of Textiles, N. C. SESUG 1994 Creating Population Tree Charts (Using SAS/GRAPH Software) Robert E. Allison, Jr. and Dr. Moon W. Suh College of Textiles, N. C. State University ABSTRACT This paper describes a SAS program

More information

Performance Considerations

Performance Considerations 149 CHAPTER 6 Performance Considerations Hardware Considerations 149 Windows Features that Optimize Performance 150 Under Windows NT 150 Under Windows NT Server Enterprise Edition 4.0 151 Processing SAS

More information

Importing Excel into SAS: A Robust Approach for Difficult-To-Read Worksheets

Importing Excel into SAS: A Robust Approach for Difficult-To-Read Worksheets Importing Excel into SAS: A Robust Approach for Difficult-To-Read Worksheets Name of event: TASS Location of event: Toronto Presenter s name: Bill Sukloff Branch name: Science &Technology Date of event:

More information

Matt Downs and Heidi Christ-Schmidt Statistics Collaborative, Inc., Washington, D.C.

Matt Downs and Heidi Christ-Schmidt Statistics Collaborative, Inc., Washington, D.C. Paper 82-25 Dynamic data set selection and project management using SAS 6.12 and the Windows NT 4.0 file system Matt Downs and Heidi Christ-Schmidt Statistics Collaborative, Inc., Washington, D.C. ABSTRACT

More information

EXAMPLE 2: INTRODUCTION TO SAS AND SOME NOTES ON HOUSEKEEPING PART II - MATCHING DATA FROM RESPONDENTS AT 2 WAVES INTO WIDE FORMAT

EXAMPLE 2: INTRODUCTION TO SAS AND SOME NOTES ON HOUSEKEEPING PART II - MATCHING DATA FROM RESPONDENTS AT 2 WAVES INTO WIDE FORMAT EXAMPLE 2: PART I - INTRODUCTION TO SAS AND SOME NOTES ON HOUSEKEEPING PART II - MATCHING DATA FROM RESPONDENTS AT 2 WAVES INTO WIDE FORMAT USING THESE WORKSHEETS For each of the worksheets you have a

More information

Chapter 2: Getting Data Into SAS

Chapter 2: Getting Data Into SAS Chapter 2: Getting Data Into SAS Data stored in many different forms/formats. Four categories of ways to read in data. 1. Entering data directly through keyboard 2. Creating SAS data sets from raw data

More information

SAS Data Libraries. Definition CHAPTER 26

SAS Data Libraries. Definition CHAPTER 26 385 CHAPTER 26 SAS Data Libraries Definition 385 Library Engines 387 Library Names 388 Physical Names and Logical Names (Librefs) 388 Assigning Librefs 388 Associating and Clearing Logical Names (Librefs)

More information

EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT)

EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT) EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT) DESCRIPTION: This example shows how to combine the data on respondents from the first two waves of Understanding Society into

More information

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys Richard L. Downs, Jr. and Pura A. Peréz U.S. Bureau of the Census, Washington, D.C. ABSTRACT This paper explains

More information

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ

Paper CC16. William E Benjamin Jr, Owl Computer Consultancy LLC, Phoenix, AZ Paper CC16 Smoke and Mirrors!!! Come See How the _INFILE_ Automatic Variable and SHAREBUFFERS Infile Option Can Speed Up Your Flat File Text-Processing Throughput Speed William E Benjamin Jr, Owl Computer

More information

Posters. Paper

Posters. Paper Paper 212-26 Using SAS/AF to Create a SAS Program File Explorer Rob Nelson, Centers for Disease Control and Prevention, Atlanta, GA Janet Royalty, Centers for Disease Control and Prevention, Atlanta, GA

More information

A Guided Tour Through the SAS Windowing Environment Casey Cantrell, Clarion Consulting, Los Angeles, CA

A Guided Tour Through the SAS Windowing Environment Casey Cantrell, Clarion Consulting, Los Angeles, CA A Guided Tour Through the SAS Windowing Environment Casey Cantrell, Clarion Consulting, Los Angeles, CA ABSTRACT The SAS system running in the Microsoft Windows environment contains a multitude of tools

More information

Prove QC Quality Create SAS Datasets from RTF Files Honghua Chen, OCKHAM, Cary, NC

Prove QC Quality Create SAS Datasets from RTF Files Honghua Chen, OCKHAM, Cary, NC Prove QC Quality Create SAS Datasets from RTF Files Honghua Chen, OCKHAM, Cary, NC ABSTRACT Since collecting drug trial data is expensive and affects human life, the FDA and most pharmaceutical company

More information

Procedure for Stamping Source File Information on SAS Output Elizabeth Molloy & Breda O'Connor, ICON Clinical Research

Procedure for Stamping Source File Information on SAS Output Elizabeth Molloy & Breda O'Connor, ICON Clinical Research Procedure for Stamping Source File Information on SAS Output Elizabeth Molloy & Breda O'Connor, ICON Clinical Research ABSTRACT In the course of producing a report for a clinical trial numerous drafts

More information

ODS in an Instant! Bernadette H. Johnson, The Blaze Group, Inc., Raleigh, NC

ODS in an Instant! Bernadette H. Johnson, The Blaze Group, Inc., Raleigh, NC Paper 210-28 ODS in an Instant! Bernadette H. Johnson, The Blaze Group, Inc., Raleigh, NC ABSTRACT Do you need to generate high impact word processor, printer- or web- ready output? Want to skip the SAS

More information

Telephone Survey Response: Effects of Cell Phones in Landline Households

Telephone Survey Response: Effects of Cell Phones in Landline Households Telephone Survey Response: Effects of Cell Phones in Landline Households Dennis Lambries* ¹, Michael Link², Robert Oldendick 1 ¹University of South Carolina, ²Centers for Disease Control and Prevention

More information

Introduction to SAS. Cristina Murray-Krezan Research Assistant Professor of Internal Medicine Biostatistician, CTSC

Introduction to SAS. Cristina Murray-Krezan Research Assistant Professor of Internal Medicine Biostatistician, CTSC Introduction to SAS Cristina Murray-Krezan Research Assistant Professor of Internal Medicine Biostatistician, CTSC cmurray-krezan@salud.unm.edu 20 August 2018 What is SAS? Statistical Analysis System,

More information

SAS Graphs in Small Multiples Andrea Wainwright-Zimmerman, Capital One, Richmond, VA

SAS Graphs in Small Multiples Andrea Wainwright-Zimmerman, Capital One, Richmond, VA Paper SIB-113 SAS Graphs in Small Multiples Andrea Wainwright-Zimmerman, Capital One, Richmond, VA ABSTRACT Edward Tufte has championed the idea of using "small multiples" as an effective way to present

More information

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY PharmaSUG 2014 - Paper BB14 A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY ABSTRACT Clinical Study

More information

SAS Training Spring 2006

SAS Training Spring 2006 SAS Training Spring 2006 Coxe/Maner/Aiken Introduction to SAS: This is what SAS looks like when you first open it: There is a Log window on top; this will let you know what SAS is doing and if SAS encountered

More information

Paper An Automated Reporting Macro to Create Cell Index An Enhanced Revisit. Shi-Tao Yeh, GlaxoSmithKline, King of Prussia, PA

Paper An Automated Reporting Macro to Create Cell Index An Enhanced Revisit. Shi-Tao Yeh, GlaxoSmithKline, King of Prussia, PA ABSTRACT Paper 236-28 An Automated Reporting Macro to Create Cell Index An Enhanced Revisit When generating tables from SAS PROC TABULATE or PROC REPORT to summarize data, sometimes it is necessary to

More information

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX Paper 152-27 From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX ABSTRACT This paper is a case study of how SAS products were

More information

Base and Advance SAS

Base and Advance SAS Base and Advance SAS BASE SAS INTRODUCTION An Overview of the SAS System SAS Tasks Output produced by the SAS System SAS Tools (SAS Program - Data step and Proc step) A sample SAS program Exploring SAS

More information

SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD

SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD Paper BB-7 SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD ABSTRACT The SAS Macro Facility offers a mechanism for expanding and customizing

More information

SAS 101. Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23. By Tasha Chapman, Oregon Health Authority

SAS 101. Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23. By Tasha Chapman, Oregon Health Authority SAS 101 Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23 By Tasha Chapman, Oregon Health Authority Topics covered All the leftovers! Infile options Missover LRECL=/Pad/Truncover

More information

AURA ACADEMY SAS TRAINING. Opposite Hanuman Temple, Srinivasa Nagar East, Ameerpet,Hyderabad Page 1

AURA ACADEMY SAS TRAINING. Opposite Hanuman Temple, Srinivasa Nagar East, Ameerpet,Hyderabad Page 1 SAS TRAINING SAS/BASE BASIC THEORY & RULES ETC SAS WINDOWING ENVIRONMENT CREATION OF LIBRARIES SAS PROGRAMMING (BRIEFLY) - DATASTEP - PROC STEP WAYS TO READ DATA INTO SAS BACK END PROCESS OF DATASTEP INSTALLATION

More information

Transforming SAS code into a SAS Macro using PERL Sumner H. Williams, CareOregon, Portland, OR, USA

Transforming SAS code into a SAS Macro using PERL Sumner H. Williams, CareOregon, Portland, OR, USA ABSTRACT Transforming SAS code into a SAS Macro using PERL Sumner H. Williams, CareOregon, Portland, OR, USA SAS code is strengthened by transforming the code into a macro. This paper is intended to demonstrate

More information

WHAT ARE SASHELP VIEWS?

WHAT ARE SASHELP VIEWS? Paper PN13 There and Back Again: Navigating between a SASHELP View and the Real World Anita Rocha, Center for Studies in Demography and Ecology University of Washington, Seattle, WA ABSTRACT A real strength

More information

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ CC13 An Automatic Process to Compare Files Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ ABSTRACT Comparing different versions of output files is often performed

More information

ABC Macro and Performance Chart with Benchmarks Annotation

ABC Macro and Performance Chart with Benchmarks Annotation Paper CC09 ABC Macro and Performance Chart with Benchmarks Annotation Jing Li, AQAF, Birmingham, AL ABSTRACT The achievable benchmark of care (ABC TM ) approach identifies the performance of the top 10%

More information

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG Paper SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG Qixuan Chen, University of Michigan, Ann Arbor, MI Brenda Gillespie, University of Michigan, Ann Arbor, MI ABSTRACT This paper

More information

Electricity Forecasting Full Circle

Electricity Forecasting Full Circle Electricity Forecasting Full Circle o Database Creation o Libname Functionality with Excel o VBA Interfacing Allows analysts to develop procedural prototypes By: Kyle Carmichael Disclaimer The entire presentation

More information

Untangling and Reformatting NT PerfMon Data to Load a UNIX SAS Database With a Software-Intelligent Data-Adaptive Application

Untangling and Reformatting NT PerfMon Data to Load a UNIX SAS Database With a Software-Intelligent Data-Adaptive Application Paper 297 Untangling and Reformatting NT PerfMon Data to Load a UNIX SAS Database With a Software-Intelligent Data-Adaptive Application Heather McDowell, Wisconsin Electric Power Co., Milwaukee, WI LeRoy

More information

SAS Macros for Grouping Count and Its Application to Enhance Your Reports

SAS Macros for Grouping Count and Its Application to Enhance Your Reports SAS Macros for Grouping Count and Its Application to Enhance Your Reports Shi-Tao Yeh, EDP Contract Services, Bala Cynwyd, PA ABSTRACT This paper provides two SAS macros, one for one grouping variable,

More information

Data Quality Review for Missing Values and Outliers

Data Quality Review for Missing Values and Outliers Paper number: PH03 Data Quality Review for Missing Values and Outliers Ying Guo, i3, Indianapolis, IN Bradford J. Danner, i3, Lincoln, NE ABSTRACT Before performing any analysis on a dataset, it is often

More information

Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research

Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research Liping Huang, Center for Home Care Policy and Research, Visiting Nurse Service of New York, NY, NY ABSTRACT The

More information

Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA

Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA ABSTRACT Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA Data set documentation is essential to good

More information

Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization

Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization Summarizing Impossibly Large SAS Data Sets For the Data Warehouse Server Using Horizontal Summarization Michael A. Raithel, Raithel Consulting Services Abstract Data warehouse applications thrive on pre-summarized

More information

PHPM 672/677 Lab #2: Variables & Conditionals Due date: Submit by 11:59pm Monday 2/5 with Assignment 2

PHPM 672/677 Lab #2: Variables & Conditionals Due date: Submit by 11:59pm Monday 2/5 with Assignment 2 PHPM 672/677 Lab #2: Variables & Conditionals Due date: Submit by 11:59pm Monday 2/5 with Assignment 2 Overview Most assignments will have a companion lab to help you learn the task and should cover similar

More information

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency. Paper 76-28 Comparative Efficiency of SQL and Base Code When Reading from Database Tables and Existing Data Sets Steven Feder, Federal Reserve Board, Washington, D.C. ABSTRACT In this paper we compare

More information

Use That SAP to Write Your Code Sandra Minjoe, Genentech, Inc., South San Francisco, CA

Use That SAP to Write Your Code Sandra Minjoe, Genentech, Inc., South San Francisco, CA Paper DM09 Use That SAP to Write Your Code Sandra Minjoe, Genentech, Inc., South San Francisco, CA ABSTRACT In this electronic age we live in, we usually receive the detailed specifications from our biostatistician

More information

Loading Data. Introduction. Understanding the Volume Grid CHAPTER 2

Loading Data. Introduction. Understanding the Volume Grid CHAPTER 2 19 CHAPTER 2 Loading Data Introduction 19 Understanding the Volume Grid 19 Loading Data Representing a Complete Grid 20 Loading Data Representing an Incomplete Grid 21 Loading Sparse Data 23 Understanding

More information

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC Paper 2417-2018 If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC ABSTRACT Reading data effectively in the DATA step requires knowing the implications

More information

Level I: Getting comfortable with my data in SAS. Descriptive Statistics

Level I: Getting comfortable with my data in SAS. Descriptive Statistics Level I: Getting comfortable with my data in SAS. Descriptive Statistics Quick Review of reading Data into SAS Preparing Data 1. Variable names in the first row make sure they are appropriate for the statistical

More information

MISSOVER, TRUNCOVER, and PAD, OH MY!! or Making Sense of the INFILE and INPUT Statements. Randall Cates, MPH, Technical Training Specialist

MISSOVER, TRUNCOVER, and PAD, OH MY!! or Making Sense of the INFILE and INPUT Statements. Randall Cates, MPH, Technical Training Specialist MISSOVER, TRUNCOVER, and PAD, OH MY!! or Making Sense of the INFILE and INPUT Statements. Randall Cates, MPH, Technical Training Specialist ABSTRACT The SAS System has many powerful tools to store, analyze

More information

SparkLines Using SAS and JMP

SparkLines Using SAS and JMP SparkLines Using SAS and JMP Kate Davis, International Center for Finance at Yale, New Haven, CT ABSTRACT Sparklines are intense word-sized graphics for use inline text or on a dashboard that condense

More information

In this paper, we will build the macro step-by-step, highlighting each function. A basic familiarity with SAS Macro language is assumed.

In this paper, we will build the macro step-by-step, highlighting each function. A basic familiarity with SAS Macro language is assumed. No More Split Ends: Outputting Multiple CSV Files and Keeping Related Records Together Gayle Springer, JHU Bloomberg School of Public Health, Baltimore, MD ABSTRACT The EXPORT Procedure allows us to output

More information

KEYWORDS Metadata, macro language, CALL EXECUTE, %NRSTR, %TSLIT

KEYWORDS Metadata, macro language, CALL EXECUTE, %NRSTR, %TSLIT MWSUG 2017 - Paper BB15 Building Intelligent Macros: Driving a Variable Parameter System with Metadata Arthur L. Carpenter, California Occidental Consultants, Anchorage, Alaska ABSTRACT When faced with

More information

Choosing the Right Procedure

Choosing the Right Procedure 3 CHAPTER 1 Choosing the Right Procedure Functional Categories of Base SAS Procedures 3 Report Writing 3 Statistics 3 Utilities 4 Report-Writing Procedures 4 Statistical Procedures 5 Efficiency Issues

More information

There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA

There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA Paper HW04 There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA ABSTRACT Clinical Trials data comes in all shapes and sizes depending

More information

Using a Control Dataset to Manage Production Compiled Macro Library Curtis E. Reid, Bureau of Labor Statistics, Washington, DC

Using a Control Dataset to Manage Production Compiled Macro Library Curtis E. Reid, Bureau of Labor Statistics, Washington, DC AP06 Using a Control Dataset to Manage Production Compiled Macro Library Curtis E. Reid, Bureau of Labor Statistics, Washington, DC ABSTRACT By default, SAS compiles and stores all macros into the WORK

More information

Correcting for natural time lag bias in non-participants in pre-post intervention evaluation studies

Correcting for natural time lag bias in non-participants in pre-post intervention evaluation studies Correcting for natural time lag bias in non-participants in pre-post intervention evaluation studies Gandhi R Bhattarai PhD, OptumInsight, Rocky Hill, CT ABSTRACT Measuring the change in outcomes between

More information

Using Dynamic Data Exchange

Using Dynamic Data Exchange 145 CHAPTER 8 Using Dynamic Data Exchange Overview of Dynamic Data Exchange 145 DDE Syntax within SAS 145 Referencing the DDE External File 146 Determining the DDE Triplet 146 Controlling Another Application

More information

Taming a Spreadsheet Importation Monster

Taming a Spreadsheet Importation Monster SESUG 2013 Paper BtB-10 Taming a Spreadsheet Importation Monster Nat Wooding, J. Sargeant Reynolds Community College ABSTRACT As many programmers have learned to their chagrin, it can be easy to read Excel

More information

SAS Drug Development Program Portability

SAS Drug Development Program Portability PharmaSUG2011 Paper SAS-AD03 SAS Drug Development Program Portability Ben Bocchicchio, SAS Institute, Cary NC, US Nancy Cole, SAS Institute, Cary NC, US ABSTRACT A Roadmap showing how SAS code developed

More information

Give me EVERYTHING! A macro to combine the CONTENTS procedure output and formats. Lynn Mullins, PPD, Cincinnati, Ohio

Give me EVERYTHING! A macro to combine the CONTENTS procedure output and formats. Lynn Mullins, PPD, Cincinnati, Ohio PharmaSUG 2014 - Paper CC43 Give me EVERYTHING! A macro to combine the CONTENTS procedure output and formats. Lynn Mullins, PPD, Cincinnati, Ohio ABSTRACT The PROC CONTENTS output displays SAS data set

More information

SAS Programs SAS Lecture 4 Procedures. Aidan McDermott, April 18, Outline. Internal SAS formats. SAS Formats

SAS Programs SAS Lecture 4 Procedures. Aidan McDermott, April 18, Outline. Internal SAS formats. SAS Formats SAS Programs SAS Lecture 4 Procedures Aidan McDermott, April 18, 2006 A SAS program is in an imperative language consisting of statements. Each statement ends in a semi-colon. Programs consist of (at least)

More information

SAS CURRICULUM. BASE SAS Introduction

SAS CURRICULUM. BASE SAS Introduction SAS CURRICULUM BASE SAS Introduction Data Warehousing Concepts What is a Data Warehouse? What is a Data Mart? What is the difference between Relational Databases and the Data in Data Warehouse (OLTP versus

More information

Optimizing System Performance

Optimizing System Performance 243 CHAPTER 19 Optimizing System Performance Definitions 243 Collecting and Interpreting Performance Statistics 244 Using the FULLSTIMER and STIMER System Options 244 Interpreting FULLSTIMER and STIMER

More information

Using Unnamed and Named Pipes

Using Unnamed and Named Pipes 227 CHAPTER 12 Using Unnamed and Named Pipes Overview of Pipes 227 Using Unnamed Pipes 228 Unnamed Pipe Syntax 228 Using Redirection Sequences 229 Unnamed Pipe Example 229 Using Named Pipes 230 Named Pipe

More information

SAS Macro Dynamics: from Simple Basics to Powerful Invocations Rick Andrews, Office of Research, Development, and Information, Baltimore, MD

SAS Macro Dynamics: from Simple Basics to Powerful Invocations Rick Andrews, Office of Research, Development, and Information, Baltimore, MD ABSTRACT CODERS CORNER SAS Macro Dynamics: from Simple Basics to Powerful Invocations Rick Andrews, Office of Research, Development, and Information, Baltimore, MD The SAS Macro Facility offers a mechanism

More information

Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC

Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC Paper CC-05 Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC ABSTRACT For many SAS users, learning SQL syntax appears to be a significant effort with a low

More information

While You Were Sleeping, SAS Was Hard At Work Andrea Wainwright-Zimmerman, Capital One Financial, Inc., Richmond, VA

While You Were Sleeping, SAS Was Hard At Work Andrea Wainwright-Zimmerman, Capital One Financial, Inc., Richmond, VA Paper BB-02 While You Were Sleeping, SAS Was Hard At Work Andrea Wainwright-Zimmerman, Capital One Financial, Inc., Richmond, VA ABSTRACT Automating and scheduling SAS code to run over night has many advantages,

More information

Arthur L. Carpenter California Occidental Consultants, Oceanside, California

Arthur L. Carpenter California Occidental Consultants, Oceanside, California Paper 028-30 Storing and Using a List of Values in a Macro Variable Arthur L. Carpenter California Occidental Consultants, Oceanside, California ABSTRACT When using the macro language it is not at all

More information

Creating and Executing Stored Compiled DATA Step Programs

Creating and Executing Stored Compiled DATA Step Programs 465 CHAPTER 30 Creating and Executing Stored Compiled DATA Step Programs Definition 465 Uses for Stored Compiled DATA Step Programs 465 Restrictions and Requirements 466 How SAS Processes Stored Compiled

More information

SAS System Powers Web Measurement Solution at U S WEST

SAS System Powers Web Measurement Solution at U S WEST SAS System Powers Web Measurement Solution at U S WEST Bob Romero, U S WEST Communications, Technical Expert - SAS and Data Analysis Dale Hamilton, U S WEST Communications, Capacity Provisioning Process

More information

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines Paper TT13 So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines Anthony Harris, PPD, Wilmington, NC Robby Diseker, PPD, Wilmington, NC ABSTRACT

More information

Epidemiology Principles of Biostatistics Chapter 3. Introduction to SAS. John Koval

Epidemiology Principles of Biostatistics Chapter 3. Introduction to SAS. John Koval Epidemiology 9509 Principles of Biostatistics Chapter 3 John Koval Department of Epidemiology and Biostatistics University of Western Ontario What we will do today We will learn to use use SAS to 1. read

More information

How to Keep Multiple Formats in One Variable after Transpose Mindy Wang

How to Keep Multiple Formats in One Variable after Transpose Mindy Wang How to Keep Multiple Formats in One Variable after Transpose Mindy Wang Abstract In clinical trials and many other research fields, proc transpose are used very often. When many variables with their individual

More information

Automating the Production of Formatted Item Frequencies using Survey Metadata

Automating the Production of Formatted Item Frequencies using Survey Metadata Automating the Production of Formatted Item Frequencies using Survey Metadata Tim Tilert, Centers for Disease Control and Prevention (CDC) / National Center for Health Statistics (NCHS) Jane Zhang, CDC

More information

EXPORTING SAS OUTPUT ONTO THE WORLD WIDE WEB

EXPORTING SAS OUTPUT ONTO THE WORLD WIDE WEB EXPORTING SAS OUTPUT ONTO THE WORLD WIDE WEB Shi-Tao Yeh, EDP Contract Services Andrew C. Yeh, Relyt Technologies Inc. ABSTRACT This paper presents a step by step demostration of exporting SAS list and

More information

Internet/Intranet, the Web & SAS

Internet/Intranet, the Web & SAS Dynamic Behavior from Static Web Applications Ted Durie, SAS, Overland Park, KS ABSTRACT Many Web applications, because of the infinite query combinations possible, require dynamic Web solutions. This

More information

Statistics, Data Analysis & Econometrics

Statistics, Data Analysis & Econometrics ST009 PROC MI as the Basis for a Macro for the Study of Patterns of Missing Data Carl E. Pierchala, National Highway Traffic Safety Administration, Washington ABSTRACT The study of missing data patterns

More information

EXST SAS Lab Lab #6: More DATA STEP tasks

EXST SAS Lab Lab #6: More DATA STEP tasks EXST SAS Lab Lab #6: More DATA STEP tasks Objectives 1. Working from an current folder 2. Naming the HTML output data file 3. Dealing with multiple observations on an input line 4. Creating two SAS work

More information

HAVE YOU EVER WISHED THAT YOU DO NOT NEED TO TYPE OR CHANGE REPORT NUMBERS AND TITLES IN YOUR SAS PROGRAMS?

HAVE YOU EVER WISHED THAT YOU DO NOT NEED TO TYPE OR CHANGE REPORT NUMBERS AND TITLES IN YOUR SAS PROGRAMS? HAVE YOU EVER WISHED THAT YOU DO NOT NEED TO TYPE OR CHANGE REPORT NUMBERS AND TITLES IN YOUR SAS PROGRAMS? Aileen L. Yam, PharmaNet, Inc., Princeton, NJ ABSTRACT In clinical research, the table of contents

More information

How to Implement the One-Time Methodology Mark Tabladillo, Ph.D., MarkTab Consulting, Atlanta, GA Associate Faculty, University of Phoenix

How to Implement the One-Time Methodology Mark Tabladillo, Ph.D., MarkTab Consulting, Atlanta, GA Associate Faculty, University of Phoenix Paper PO-09 How to Implement the One-Time Methodology Mark Tabladillo, Ph.D., MarkTab Consulting, Atlanta, GA Associate Faculty, University of Phoenix ABSTRACT This paper demonstrates how to implement

More information

Creating Forest Plots Using SAS/GRAPH and the Annotate Facility

Creating Forest Plots Using SAS/GRAPH and the Annotate Facility PharmaSUG2011 Paper TT12 Creating Forest Plots Using SAS/GRAPH and the Annotate Facility Amanda Tweed, Millennium: The Takeda Oncology Company, Cambridge, MA ABSTRACT Forest plots have become common in

More information

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes Brian E. Lawton Curriculum Research & Development Group University of Hawaii at Manoa Honolulu, HI December 2012 Copyright 2012

More information

Something for Nothing! Converting Plots from SAS/GRAPH to ODS Graphics

Something for Nothing! Converting Plots from SAS/GRAPH to ODS Graphics ABSTRACT Paper 1610-2014 Something for Nothing! Converting Plots from SAS/GRAPH to ODS Graphics Philip R Holland, Holland Numerics Limited, UK All the documentation about the creation of graphs with SAS

More information

Dictionary.coumns is your friend while appending or moving data

Dictionary.coumns is your friend while appending or moving data ABSTRACT SESUG Paper CC-41-2017 Dictionary.coumns is your friend while appending or moving data Kiran Venna, Dataspace Inc. Dictionary.columns is a dictionary table, which gives metadata information of

More information

Picturing Statistics Diana Suhr, University of Northern Colorado

Picturing Statistics Diana Suhr, University of Northern Colorado Picturing Statistics Diana Suhr, University of Northern Colorado Abstract Statistical results could be easier to understand if you visualize them. This Hands On Workshop will give you an opportunity to

More information

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board SAS PROGRAM EFFICIENCY FOR BEGINNERS Bruce Gilsen, Federal Reserve Board INTRODUCTION This paper presents simple efficiency techniques that can benefit inexperienced SAS software users on all platforms.

More information

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board SAS PROGRAM EFFICIENCY FOR BEGINNERS Bruce Gilsen, Federal Reserve Board INTRODUCTION This paper presents simple efficiency techniques that can benefit inexperienced SAS software users on all platforms.

More information

Bruce Gilsen, Federal Reserve Board

Bruce Gilsen, Federal Reserve Board SAS PROGRAM EFFICIENCY FOR BEGINNERS Bruce Gilsen, Federal Reserve Board INTRODUCTION This paper presents simple efficiency techniques that can benefit inexperienced SAS software users on all platforms

More information

A SAS SYSTEM: FROM CD TO GIS MAPS Samuel D. Calhoun, Economic Research Service, USDA

A SAS SYSTEM: FROM CD TO GIS MAPS Samuel D. Calhoun, Economic Research Service, USDA A SAS SYSTEM: FROM CD TO GIS MAPS Samuel D. Calhoun, Economic Research Service, USDA ABSTRACT Once a year the Rural Business and Development Policy Branch, Economic Research Service, USDA, publishes an

More information

T.I.P.S. (Techniques and Information for Programming in SAS )

T.I.P.S. (Techniques and Information for Programming in SAS ) Paper PO-088 T.I.P.S. (Techniques and Information for Programming in SAS ) Kathy Harkins, Carolyn Maass, Mary Anne Rutkowski Merck Research Laboratories, Upper Gwynedd, PA ABSTRACT: This paper provides

More information