Use That SAP to Write Your Code Sandra Minjoe, Genentech, Inc., South San Francisco, CA

Similar documents
Chapter 2: Getting Data Into SAS

SUGI 29 Data Warehousing, Management and Quality

WRITE SAS CODE TO GENERATE ANOTHER SAS PROGRAM

Write SAS Code to Generate Another SAS Program A Dynamic Way to Get Your Data into SAS

Paper A Simplified and Efficient Way to Map Variable Attributes of a Clinical Data Warehouse

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

Code Plug Management: Contact List Import/Export. Version 1.0, Dec 16, 2015

Reference Guide. Adding a Generic File Store - Importing From a Local or Network ShipWorks Page 1 of 21

comma separated values .csv extension. "save as" CSV (Comma Delimited)

The first thing we ll need is some numbers. I m going to use the set of times and drug concentration levels in a patient s bloodstream given below.

SESUG 2014 IT-82 SAS-Enterprise Guide for Institutional Research and Other Data Scientists Claudia W. McCann, East Carolina University.

Chapter 2 The SAS Environment

HOW TO USE THE EXPORT FEATURE IN LCL

Simply Accounting Intelligence Tips and Tricks Booklet Vol. 1

Group Administrator. ebills csv file formatting by class level. User Guide

One SAS To Rule Them All

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.

Word: Print Address Labels Using Mail Merge

Moving Data and Results Between SAS and Excel. Harry Droogendyk Stratia Consulting Inc.

COPYRIGHTED MATERIAL. Making Excel More Efficient

Instructions on Adding Zeros to the Comtrade Data

File Triage. Work Smarter in Word, Excel, & PowerPoint. Neil Malek, MCT-ACI-CTT+

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

1.a) Go to it should be accessible in all browsers

Light Speed with Excel

DSCI 325: Handout 2 Getting Data into SAS Spring 2017

ODS/RTF Pagination Revisit

MAS 90/200 Intelligence Tips and Tricks Booklet Vol. 1

Basics of Stata, Statistics 220 Last modified December 10, 1999.

How to import text files to Microsoft Excel 2016:

The Programmer's Solution to the Import/Export Wizard

December Copyright 2018 Open Systems Holdings Corp. All rights reserved.

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL

6. Essential Spreadsheet Operations

Instructions for Using the Databases

SAS 101. Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23. By Tasha Chapman, Oregon Health Authority

Become strong in Excel (2.0) - 5 Tips To Rock A Spreadsheet!

The first time you open Word

HOW TO EXPORT BUYER NAMES & ADDRESSES FROM PAYPAL TO A CSV FILE

Microsoft Excel 2007

Microsoft Excel Level 2

Lab 1. Introduction to R & SAS. R is free, open-source software. Get it here:

BI-09 Using Enterprise Guide Effectively Tom Miron, Systems Seminar Consultants, Madison, WI

HAVE YOU EVER WISHED THAT YOU DO NOT NEED TO TYPE OR CHANGE REPORT NUMBERS AND TITLES IN YOUR SAS PROGRAMS?

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL

(Updated 29 Oct 2016)

Using Microsoft Excel


This is a book about using Visual Basic for Applications (VBA), which is a

Civil Engineering Computation

CSV Roll Documentation

Make Your Life a Little Easier: A Collection of SAS Macro Utilities. Pete Lund, Northwest Crime and Social Research, Olympia, WA

EXCEL BASICS: MICROSOFT OFFICE 2007

Importing CSV Data to All Character Variables Arthur L. Carpenter California Occidental Consultants, Anchorage, AK

Base and Advance SAS

This book is about using Visual Basic for Applications (VBA), which is a

Lesson 15: Using Text Files to Add or Modify Design Properties

Using SAS Enterprise Guide to Coax Your Excel Data In To SAS

BIOMETRICS INFORMATION

1 Introduction to Using Excel Spreadsheets

Using Dynamic Data Exchange

Earthquake data in geonet.org.nz

How to Import Part Numbers to Proman

STA9750 Lecture I OUTLINE 1. WELCOME TO 9750!

Guide to Importing Data

CSCI 1100L: Topics in Computing Lab Lab 07: Microsoft Access (Databases) Part I: Movie review database.

Matt Downs and Heidi Christ-Schmidt Statistics Collaborative, Inc., Washington, D.C.

5. Excel Fundamentals

EXCEL BASICS: MICROSOFT OFFICE 2010

Background. $VENDOR wasn t sure either, but they were pretty sure it wasn t their code.

Downloading 2010 Census Data

Other Data Sources SAS can read data from a variety of sources:

Every project requires communication and collaboration and usually a lot of

Manual Word Excel 2010 Mail Merge Labels Next Record

Chapter 7 Notes Chapter 7 Level 1

Lesson 2 Characteristics of Good Code Writing (* acknowledgements to Dr. G. Spinelli, New Mexico Tech, for a substantial portion of this lesson)

Making use of other Applications

One of Excel 2000 s distinguishing new features relates to sharing information both

Creating Accounts and Test Registrations Using Batch Load

Topic 4D: Import and Export Contacts

SharePoint 2010 Site Owner s Manual by Yvonne M. Harryman

Identifying Updated Metadata and Images from a Content Provider

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

A Practical Introduction to SAS Data Integration Studio

Here is an example of a credit card export; none of the columns or data have been modified.

Building Self-Service BI Solutions with Power Query. Written By: Devin

Welcome to Cole On-line Help system!

Computer Science Lab Exercise 1

Choosing the Right Tool from Your SAS and Microsoft Excel Tool Belt

Creating a new form with check boxes, drop-down list boxes, and text box fill-ins. Customizing each of the three form fields.

DOING MORE WITH EXCEL: MICROSOFT OFFICE 2013

Lab #1: Introduction to Basic SAS Operations

Chapter 7. Joining Maps to Other Datasets in QGIS

Excel Basic: Create Formulas

Chapter 3: The IF Function and Table Lookup

Advanced Excel Reporting

Macros enable you to automate almost any task that you can undertake

Enterprise Reporting -- APEX

Download Instructions

Transcription:

Paper DM09 Use That SAP to Write Your Code Sandra Minjoe, Genentech, Inc., South San Francisco, CA ABSTRACT In this electronic age we live in, we usually receive the detailed specifications from our biostatistician in an electronic statistical analysis plan (SAP). We often then turn around and retype much of these specs into our programs. This paper describes and gives examples of how we can take those electronic specs, from, say, an Microsoft Excel spreadsheet or Microsoft Word table, and convert them into a dataset or code logic. This prevents us from introducing typos because we no longer have to retype information already provided. The real savings come later on, though, when the specs change (as they so often do). By using the techniques outlined in this paper, when specs change we simply need to rerun, rather than recode. INTRODUCTION As statistical programmers working in the biotech/pharma industry, we re pushed to produce output quickly to get an FDA filing out the door as soon as possible. We must also verify that our code does what our biostatistician intended. Because timelines for filings are shortening, we need to find ways to reduce the time spent coding. We are usually very rushed at the end of the project, but often have a bit more time to work with early on. It can thus be valuable to do a few extra steps early in the project that will then shorten the time required making changes later. This paper proposes a solution that helps automate the process of bringing in lengthy specs and can thus prevent time-consuming lastminute code changes. ISSUES WITH HARD CODING At first, hard coding doesn t seem so bad. For example, assume our specs list a dozen terms that fall into one category: Category Term 1 Term 2 Term 3 Term 12 It is quite simple to hard-code a statement to check for each of these terms, such as: if term in ( Term 1, Term 2, Term 3,... Term 12 ) then... Then, when looking through your clinical data, this if statement will easily determine whether the term matches any of the dozen specified. This code is pretty straightforward. We really just need to verify that we typed each of the Term names correctly and aren t introducing any typos. In fact, with electronic documents, we can even simply cut each term from the spec and paste it into the code. This works fine for short lists of text. But what about when we have pages of text strings to search for? It would be easy to miss some of the strings if we have to manually deal with each one separately. Another concern comes later on, when the specs change. Often what we get is just a new set of terms, and we have to read through them to determine what the differences are. We might find, for example, two terms were removed, the spelling on one changed, and five terms were added. That s a lot of code checking, typing, and re-verifying to assure that we captured all those changes correctly. And what happens if we missed a change? Now multiply this by the many other hard-coded specs that our programs contain, and it is easy to see how last minute spec changes can become a huge nightmare for us! RECOMMENDED TECHNIQUE Instead of traditional hard-coding, what we need is a method that allows for last-minute changes of specs without a lot of time spent on recoding. The solution outlined here is a partially automated technique, where some of the work is done outside of SAS, and some with SAS code.

I ve broken this down into four basic steps: 1. Convert our specs into an Microsoft Excel spreadsheet, with a column for each type of text or instruction. 2. Save this as a file that SAS can read, on the operating system we are using for our analysis. (In my case this is a CSV file on UNIX.) 3. Convert the file into a SAS dataset. 4. Use the SAS dataset to derive code we can use for checking against our study data. Depending on your needs, the structure of you incoming data, and the operating system(s) used, some of these steps may not be required. Over the next couple pages are some examples that walk through how this can be done, starting from the simple and adding complexity. EXAMPLE 1: Going from an Microsoft Excel table to a SAS dataset We receive a supplemental spec in Microsoft Excel titled Group Terms.xls that is laid out as: Term Aaaaa Bbbbb Ccccc Ddddd. We want to put this information into a SAS dataset that we can then use in our program. We don t need to convert our specs into a spreadsheet, as they were already provided in this format. We want to save this Microsoft Excel file in a comma-delimited format as TERMS.CSV. (Note that if we re working on PC SAS, an Microsoft Excel file can be easily imported. This step is really only necessary if we can t directly import the Microsoft Excel file, as with UNIX SAS.) To create the CSV file, we can make use of menus in Microsoft Excel : Click on File -> Save As Choose our folder (in my case this would be a UNIX directory) Type in the filename TERMS.CSV Select, from the Save As Type pull-down menu, CSV (Comma delimited) (*.csv) We will get a warning about not being able to save multiple sheets. Since we need only the active sheet for our work, just click OK. We will get a warning about not being able to save all the features used in an Microsoft Excel spreadsheet. We need only those that the CSV file supports, so click OK. The document will still look like an Microsoft Excel spreadsheet even after saving it as a CSV file. This is all the work we need to do in Microsoft Excel. Note that when we close the document or exit Microsoft Excel, it is expecting us to save it as a.xls file. So even if it warns us, we don t need to save again, because the document already exists as both a.xls (the original file) and a.csv file. We want to import this CSV file to create a SAS dataset TERMS.SAS7BDAT with the variable called GROUP1.

The following code will accomplish this: data WORK.TERMS; %let _EFIERR_ = 0; /* set the ERROR detection variable */ infile '../Terms.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2; informat GROUP1 $8. ; format GROUP1 $8. ; input GROUP1 $; * set ERROR detection macro variable; if _ERROR_ then call symput('_efierr_',1); run; (Note: if you re like me and you don t import from non-sas files much, you may want to use the SAS Import Wizard. From the display manager, start by clicking on File -> Import Data, and it will walk you through a few screens. You have the option of saving the code from these windows, or you can use Recall Last Submit to bring it into the editor. This is how I generated the above code.) We can import an Microsoft Excel file in this same way. Be aware, though, that an Microsoft Excel file often doesn t cross operating systems well, which is why we created the CSV file in the previous step. Once this code is written (or generated by using the Wizard), we then must save it, in case the specs are later modified and we need to do this step again. Because, in this case, our final product needed is a SAS dataset, we don t need to convert the dataset into code. Now that the data is in a SAS dataset, we can use it as we would any other SAS data. And because we saved the SAS program that performs the import, if (when) the list of terms changes later on, all we have to do is repeat Step 2 (if we re working on an operating system other than PC) and re-run the program from Step 3. EXAMPLE 2: Going from an Microsoft Excel file to a macro variable Let s say that instead of a SAS dataset, what we really need to do is generate a list of text strings that we can search through to determine if our clinical data matches any one of them. In other words, something like: if term in ( Aaaaa Bbbbb Ccccc...) then... If we have the same data in Microsoft Excel as we did in Example 1, we follow Steps 1-3 as above, but need to also perform Step 4 to generate the SAS code. We don t need to convert our specs into a spreadsheet, as they were already provided in this format. To save this Microsoft Excel file in a comma delimited format as TERMS.CSV, we can make use of menus in Microsoft Excel, as show in Step 2 of Example 1. To import this CSV file and create the SAS dataset TERMS.SAS7BDAT with the variable called GROUP1, we can use the code block show in Step 3 of Example 1. We need to write code to create a list of text strings that can be searched, using the data now in our one-column SAS dataset generated in Step 3. One way to create this list is to first concatenate all the text from these records together into one long text field, and then put this into a macro variable. The following PROC SQL step will accomplish most of this: * Create macro variable to hold all Group 1 terms, based on specs from Dr. X titled * 'Group Terms.xls' and converted to SAS dataset terms in the input directory.; proc sql; select group1 into : grp1list separated by " " from work.terms; quit; This generates one long macro variable, &grp1list, that contains the text Aaaaa Bbbbb Ccccc.... Because PROC SQL is putting separators only between text strings and not before or after them, it does not contain the double

quote that should be to the left of the first text string, nor does it contain the one at the right of the last text string. It also doesn t contain parentheses, which will prove useful later on. We can easily add these onto our macro variable with the following: * Add parentheses and quotes to left of first and right of last string; %let grp1list = ("&grp1list"); This generates our final macro variable, &grp1list, that contains the text ( Aaaaa Bbbbb Ccccc... ). That is, it lists, within parentheses, all the terms we imported, each in quotes and separated from each other by a single space. Later, when we want to check the terms from our clinical data against those given to us in Group 1, we can simply say: if term in &grp1list then... This macro variable resolves to the code: if term in ("Aaaaa" "Bbbbb" "Ccccc"...) then... Note: many programmers include a comma between items in a list. This is not required and for clarity I chose not to use them here. Once again, because we save the SAS program(s) to do the import and create the macro variable, if (when) the list of terms changes later on, all we have to do is repeat Step 2 and re-run the code from Steps 3-4. EXAMPLE 3: Going from an Microsoft Excel file to macro variables This same idea can be expanded to Microsoft Excel spreadsheets with multiple columns. Suppose what we have is a spreadsheet with three columns, each specifying a different group, laid out as: Group 1 Group 2 Group 3 Aaaaa Abcde Abbbb Bbbbb Bcdef Bcccc Ccccc Cdefg Cdddd Ddddd Defgh Deeee... Steps 1 and 2 remain the same as before. We don t need to convert our specs into a spreadsheet, as they were already provided in this format. To save this Microsoft Excel file in a comma delimited format as TERMS.CSV, we can make use of menus in Microsoft Excel, as shown in Step 2 of Example 1. This exact process will create a CSV file from an Microsoft Excel file with any number of columns. We now have to import three columns, not just one. We can use SAS Import Wizard to help us through this, similar to Step 3 in Examples 1 and 2. We end up with code such as the following: data WORK.TERMS; %let _EFIERR_ = 0; /* set the ERROR detection variable */ infile '../Terms.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2; informat GROUP1-GROUP3 $8. ; format GROUP1-GROUP3 $8. ; input GROUP1 GROUP2 GROUP3 $; * set ERROR detection macro variable; if _ERROR_ then call symput('_efierr_',1); run; Now our dataset TERMS.SAS7BDAT now contains columns for Group1, Group2, and Group3. If our need was simply for a SAS dataset, we could stop here. Since we want to create macro variables, we must go on to Step 4.

We need to create a set of character strings that can be searched, one for each of the columns in the dataset generated in Step 3. Similar to Step 4 in Example 2, we use PROC SQL to pull together all the different strings for a group into a long text string and then add the parentheses and outside quotes with LET statements. The following code, an enhanced version of what we used in Step 4 of Example 2, will do this: * Create macro variables to hold all the non-missing Group 1-3 terms, based on specs * from Dr. X titled 'Group Terms.xls' and converted to SAS dataset terms in the input * directory.; proc sql; select group1 into : grp1list separated by " " from work.terms where group1 > ; select group2 into : grp2list separated by " " from work.terms where group2 > ; select group3 into : grp3list separated by " " from work.terms where group3 > ; quit; * Add parentheses and quotes to left of first and right of last string; %let grp1list = ("&grp1list"); %let grp2list = ("&grp2list"); %let grp3list = ("&grp3list"); Note that because there will likely be different numbers of terms in each column, there is now some code in the PROC SQL section to check whether the field is non-missing before including it in the list. These macro variables can now be used in our code. If we want to check the terms from our clinical data against those given to us in groups 1, 2, or 3, we can simply say: if term in &grp1list then...; else if term in &grp2list then...; else if term in &grp3list then...; These macro variables resolve as follows: if term in ( Aaaaa Bbbbb Ccccc...) then...; else if term in ( Abcde Bcdef Cdefg...) then...; else if term in ( Abbbb Bcccc Cdddd...) then...; EXAMPLE 4: Going from a Microsoft Word table to macro variables With only one extra step, we can even tackle specs given to us in a Microsoft Word table, by first converting to Microsoft Excel and then to a SAS dataset. Suppose that what we have is a Microsoft Word table with three columns instead of an Microsoft Excel spreadsheet with three columns, where each column specifies a different group, that is laid out as: Group 1 Group 2 Group 3 Aaaaa Abcde Abbbb Bbbbb Bcdef Bcccc Ccccc Cdefg Cdddd Ddddd Defgh Deeee... Before we can bring this into SAS, we first convert it from a Microsoft Word table to an Microsoft Excel spreadsheet. Within Microsoft Word, select the entire table, including the row headers, and copy it to the clipboard (Ctrl+C). Open a blank Microsoft Excel worksheet, move to cell A1, and paste the table (Ctrl+V). The column widths won t be as were specified in the Microsoft Word table, but all the information will be there. Steps 2-4 remain the same as before.

To save this Microsoft Excel file in a comma delimited format as TERMS.CSV, we can make use of menus in Microsoft Excel, as shown in Step 2 of Example 1. To create the SAS dataset, we need to run the program shown in Step 3 of Example 3. To derive the code as macro variables, we need to run the program shown in Step 4 of Example 3. EXAMPLE 5: Going from other text to macro variables If our specs are in text format but not in a table, we have a little more work to do in step 1. First we need to get our specs into Microsoft Word, if they re not already there. For example, if they are in an email text, block the string of text that needs to be used, and copy it into the clipboard (Ctrl+C). Open a blank Microsoft Word document and paste the text (Ctrl+V). From this point we can use Microsoft Word to make this document look like a list, by doing things such as fixing line breaks. After the text is in Microsoft Word, we need to convert it into a table. Microsoft Word can do most of the dirty work for us here. Click on the pull-down menus Table -> Convert -> Text to Table. From that menu, choose the number of columns to create and your delimiter. This may take a little manual tweaking of the data to get it to convert into a nice table. Once we have it looking nice, we probably want to save this document. In fact, it wouldn t be a bad idea to send it back to our biostatistician in this form, just to confirm that it matches what they intended. That way, if changes need to be made later on, they can be made directly to this Microsoft Word document, rather than the original spec. We now need to copy this data to Microsoft Excel. First we select the entire table, including the row headers, and copy it to the clipboard (Ctrl+C). We then open a blank Microsoft Excel worksheet, move to cell A1, and paste the table (Ctrl+V). The column widths won t be as were specified in the Microsoft Word table, but all the information will be there. Steps 2-4 remain the same as before: To save this Microsoft Excel file in a comma delimited format as TERMS.CSV, we can make use of menus in Microsoft Excel, as shown in Step 2 of Example 1. To create the SAS dataset, we need to run the program shown in Step 3 of Example 3. To derive the code as macro variables, we need to run the program shown in Step 4 of Example 3. A REAL USE OF THIS SYSTEM I received a spec from a biostatistician that contained 4 different groups of medical history terms to search for. The Microsoft Word spec, given to me in a table form, looked something like this: History Search Type Search String DIABETES EXACT DM DIABETES INCLUDE AODM DIABETES INCLUDE DIABETES DIABETES EXCLUDE GESTATIONAL... I wanted to end up with code that would allow me to look at each medical history term in my clinical data (variable HXDES) and determine if it was in one of these categories. In other words, code such as: if hxdes in ( DM...) or ((index(hxdes, AODM ) or index(hxdes, DIABETES )...) and not (index(hxdes, GESTATIONAL ) or...) ) then... This spec document was 5 pages long, and many of the terms and partial terms it included in the Search String column were lengthy. I didn t want to manually retype this information into code such as this. The copy/paste approach would have helped

prevent typos in my SAS program, but would not be easy to handle later, when the specs were updated. Instead, I used the approach described earlier. First I copied each of these Microsoft Word tables to an Microsoft Excel spreadsheet, so that I ended up with 4 spreadsheets. I then saved each of these Microsoft Excel spreadsheets as a CSV file, so that I ended up with 4 CSV files on UNIX. I brought each of these CSV files into SAS to derive a SAS dataset, using code similar to this: data WORK.htn; run; %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ infile 'diabetes.csv' delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=3; * SET UP INFORMATS AND FORMATS; informat Hx_Term $15. Type $11. String $75.; format Hx_Term $15. Type $11. String $75.; * BRING IN THE DATA AND CHECK FOR ERRORS; input Hx_Term $ Type $ String $; if _ERROR_ then call symput('_efierr_',1); * KEEP ONLY NON-BLANK RECORDS (RECORDS WITH BLANK HX_TERM ARE ALL BLANK); if hx_term > '' then output; I used the Import Wizard to help me create this code. After running it, I ended up with 4 SAS datasets, each with columns Hx_Term, Type, and String, and with a row for each of the terms to search against. Reviewing the specs, I noted that the terms that were marked INCLUDE or EXCLUDE needed to be handled differently than those marked EXACT. In these same programs, I made use of the variable TYPE to derive new variables that contain some of the code I would later use to check my medical history terms. * CREATE SAS CODE FOR IF STATEMENT FROM DATA; format code $16.; if upcase(type) in ('INCLUDE' 'EXCLUDE') then do; if first.type then code = 'index(hxdes,'; else code = 'or index(hxdes,'; end; else if first.type then code = 'hxdes in ('; To derive the code strings I would later use to make the comparisons, I wrote code similar to the following: * BRING IN FOUR MEDICAL HISTORY DATASETS AND CREATE MACRO VARIABLES; data _null_; set isso.arth isso.dm isso.htn isso.v_throm; by hx_term; * CREATE LARGE TEXT VARIABLES TO HOLD STRINGS WHILE CREATING; format exact excl incl $5000.; retain exact excl incl; * CREATE ARRAYS TO LOOP THROUGH 3 DERIVED CODE STRINGS; array _outstr (3) exact excl incl; array _type (3) $10. _TEMPORARY_ ('EXACT' 'EXCLUDE' 'INCLUDE'); * RESET AT START OF EACH NEW DATASET; if first.hx_term then do i = 1 to 3; _outstr(i) = ''; end;

* APPEND INDIVIDUAL STRING TO APPROPRIATE LONG CODE STRING; do i = 1 to 3; if type = _type(i) then do; * CONCATENATE THE CODE AND STRING INTO MEANINGFUL IF CONDITION; if i = 1 then _outstr(i) = left(trim(_outstr(i)) ' ' trim(code) trim(string)); end; end; else _outstr(i) = left(trim(_outstr(i)) ' ' trim(code) ' ' trim(string) ')'); I then derived 12 macro variables, one for each of the 3 types of checks in the 4 types of medical history categories. For this, I used code such as: call symput ("ar_exact", exact); All of this code was incorporated into a program that was never run independently. Instead, it was included into each program that searched through the medical history terms. In these calling programs, after the macro variables were created, I was then able to do my searches. My code looked something like the following: if &ar_exact or ((&ar_incl ) and not(&ar_excl)) then... Which then resolved to: if hxdes in ( DM...) or ((index(hxdes, AODM ) or index(hxdes, DIABETES )...) and not (index(hxdes, GESTATIONAL ) or...) ) then... Because of the multi-step nature of this process, I created (for myself and others) some documentation that noted the locations of all of these files and programs, and in what order to do all the steps. This documentation basically specified that when the specs changed, to redo Step 1 (creating the Microsoft Excel spreadsheets) and Step 2 (creating the CSV files), rerun the program in Step 3 (to create the SAS datasets), and finally rerun all of the programs that used the included code shown in Step 4 (so that they would be using the most current information). When the specs changed, not long before all the output was due for submission, this whole process took me just a couple minutes, plus another short period of time to have the results re-verified. SUMMARY We ve seen five different general examples, each in increasing complexity, plus a real example of how I ve used this method to get electronic specs into our code. Here is a summary of those steps: 1. Convert our specs into a spreadsheet format, with a column for each type of text or instruction. If our specs are in a Microsoft Word table, a simple copy and paste into Microsoft Excel can do this. If they are in other text form, we much first convert to a Microsoft Word table. 2. Save this as a file that SAS can read, such as a comma delimited (*.csv). Be sure to save the file to the platform where we do our analysis (such as UNIX). Note that this step may not be required on all operating systems. 3. Convert the data into a SAS dataset. We can use the SAS Import Wizard to help us write this code. 4. Convert the SAS dataset into code we can use for checking against our study data. The examples created macro variables that concatenate together all the text strings to be searched. CONCLUSION Whenever we have detailed electronic specs that we need to get into our code, consider bringing in those specs instead of manually typing them. A huge advantage is that this code doesn t need to be updated when specs change. Instead we simply need to re-do the step(s) that create the SAS dataset and code. Code such as this lasts a long time because even when specs are updated, it doesn t need to change. Because we re doing more work in Microsoft Word, Microsoft Excel, and importing than we would if we d just manually hard-coded, it might even take us a bit longer to get this system set up and write our code. We should probably also write some documentation to explain the process. However, it seems that in many projects we have more time at the beginning and less time at the end, so this could work to our advantage. Also, once this process is set up, it can be used again and again for other similar work in the same or even different projects. The intent is that the savings gained in handling changes at the end of the study will be worth any extra time spent up front in going through this process.

CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Sandra Minjoe Genentech, Inc. 1 DNA Way South San Francisco, CA 94080 (650) 225-4733 fax: (650) 225-4611 email: sminjoe@gene.com