PROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING

Similar documents
Merge Processing and Alternate Table Lookup Techniques Prepared by

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

Table Lookups: Getting Started With Proc Format

Gary L. Katsanis, Blue Cross and Blue Shield of the Rochester Area, Rochester, NY

Table Lookups: From IF-THEN to Key-Indexing

Hash Objects for Everyone

9 Ways to Join Two Datasets David Franklin, Independent Consultant, New Hampshire, USA

Optimizing System Performance

USING SAS SOFTWARE TO COMPARE STRINGS OF VOLSERS IN A JCL JOB AND A TSO CLIST

BEYOND FORMAT BASICS 1

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U?

. NO MORE MERGE - Alternative Table Lookup Techniques Dana Rafiee, Destiny Corporation/DDISC Group Ltd. U.S., Wethersfield, CT

Tackling Unique Problems Using TWO SET Statements in ONE DATA Step. Ben Cochran, The Bedford Group, Raleigh, NC

SAS Scalable Performance Data Server 4.3

Format-o-matic: Using Formats To Merge Data From Multiple Sources

Beginning Tutorials. PROC FSEDIT NEW=newfilename LIKE=oldfilename; Fig. 4 - Specifying a WHERE Clause in FSEDIT. Data Editing

STOP MERGING AND START COMBINING by Robert S. Nicol U.S. Quality Algorithms

Merging Data Eight Different Ways

Paper Haven't I Seen You Before? An Application of DATA Step HASH for Efficient Complex Event Associations. John Schmitz, Luminare Data LLC

Using an ICPSR set-up file to create a SAS dataset

Countdown of the Top 10 Ways to Merge Data David Franklin, Independent Consultant, Litchfield, NH

SYSTEM 2000 Essentials

Ditch the Data Memo: Using Macro Variables and Outer Union Corresponding in PROC SQL to Create Data Set Summary Tables Andrea Shane MDRC, Oakland, CA

Are Your SAS Programs Running You? Marje Fecht, Prowerk Consulting, Cape Coral, FL Larry Stewart, SAS Institute Inc., Cary, NC

Characteristics of a "Successful" Application.

Performance Considerations

NO MORE MERGE. Alternative Table Lookup Techniques

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U?

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

Using the SQL Editor. Overview CHAPTER 11

Formats. Formats Under UNIX. HEXw. format. $HEXw. format. Details CHAPTER 11

Posters. Workarounds for SASWare Ballot Items Jack Hamilton, First Health, West Sacramento, California USA. Paper

Chaining Logic in One Data Step Libing Shi, Ginny Rego Blue Cross Blue Shield of Massachusetts, Boston, MA

Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA

Checking for Duplicates Wendi L. Wright

PROC FORMAT Jack Shoemaker Real Decisions Corporation

10 The First Steps 4 Chapter 2

Loading Data. Introduction. Understanding the Volume Grid CHAPTER 2

Quicker Than Merge? Kirby Cossey, Texas State Auditor s Office, Austin, Texas

Programming Beyond the Basics. Find() the power of Hash - How, Why and When to use the SAS Hash Object John Blackwell

What to Expect When You Need to Make a Data Delivery... Helpful Tips and Techniques

S-M-U (Set, Merge, and Update) Revisited

Working with Administrative Databases: Tips and Tricks

1. Join with PROC SQL a left join that will retain target records having no lookup match. 2. Data Step Merge of the target and lookup files.

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board

SAS PROGRAM EFFICIENCY FOR BEGINNERS. Bruce Gilsen, Federal Reserve Board

Bruce Gilsen, Federal Reserve Board

50 WAYS TO MERGE YOUR DATA INSTALLMENT 1 Kristie Schuster, LabOne, Inc., Lenexa, Kansas Lori Sipe, LabOne, Inc., Lenexa, Kansas

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

An Easy Way to Split a SAS Data Set into Unique and Non-Unique Row Subsets Thomas E. Billings, MUFG Union Bank, N.A., San Francisco, California

Paper PO06. Building Dynamic Informats and Formats

An exercise in separating client-specific parameters from your program

Leave Your Bad Code Behind: 50 Ways to Make Your SAS Code Execute More Efficiently.

SUGI 29 Data Warehousing, Management and Quality

SAS Scalable Performance Data Server 4.3 TSM1:

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.

Common Sense Tips and Clever Tricks for Programming with Extremely Large SAS Data Sets

SAS/ASSIST Software Setup

Paper CT-16 Manage Hierarchical or Associated Data with the RETAIN Statement Alan R. Mann, Independent Consultant, Harpers Ferry, WV

Will Your Data Warehouse Stand the Test of rime? David Annis, Amadeus Data Processing, Germany

using and Understanding Formats

Stephen M. Beatrous, SAS Institute Inc., Cary, NC John T. Stokes, SAS Institute Inc., Austin, TX

Considerations of Analysis of Healthcare Claims Data

Comparison of different ways using table lookups on huge tables

Locking SAS Data Objects

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC

An Introduction to Compressing Data Sets J. Meimei Ma, Quintiles

An Introduction to SAS/FSP Software Terry Fain, RAND, Santa Monica, California Cyndie Gareleck, RAND, Santa Monica, California

3. Almost always use system options options compress =yes nocenter; /* mostly use */ options ps=9999 ls=200;

Please don't Merge without By!!

capabilities and their overheads are therefore different.

SAS/Warehouse Administrator Usage and Enhancements Terry Lewis, SAS Institute Inc., Cary, NC

PROC MEANS for Disaggregating Statistics in SAS : One Input Data Set and One Output Data Set with Everything You Need

FSEDIT Procedure Windows

Using SAS/SHARE More Efficiently

SAS File Management. Improving Performance CHAPTER 37

CHAPTER 7 Examples of Combining Compute Services and Data Transfer Services

The Path To Treatment Pathways Tracee Vinson-Sorrentino, IMS Health, Plymouth Meeting, PA

Validating And Updating Your Data Using SAS Formats Peter Welbrock, Britannia Consulting, Inc., MA

Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC

Using Cross-Environment Data Access (CEDA)

Are Your SAS Programs Running You?

Cleaning up your SAS log: Note Messages

SAS 101. Based on Learning SAS by Example: A Programmer s Guide Chapter 21, 22, & 23. By Tasha Chapman, Oregon Health Authority

SAS Viya 3.1 FAQ for Processing UTF-8 Data

Paper SAS Managing Large Data with SAS Dynamic Cluster Table Transactions Guy Simpson, SAS Institute Inc., Cary, NC

The Problem With NODUPLICATES, Continued

Using Data Transfer Services

Data Set Options. Specify a data set option in parentheses after a SAS data set name. To specify several data set options, separate them with spaces.

Introduction. Understanding SAS/ACCESS Descriptor Files. CHAPTER 3 Defining SAS/ACCESS Descriptor Files

Version 8 Base SAS Performance: How Does It Stack-Up? Robert Ray, SAS Institute Inc, Cary, NC

BASICS BEFORE STARTING SAS DATAWAREHOSING Concepts What is ETL ETL Concepts What is OLAP SAS. What is SAS History of SAS Modules available SAS

Paper SAS Programming Conventions Lois Levin, Independent Consultant, Bethesda, Maryland

SAS System Powers Web Measurement Solution at U S WEST

SAS Infrastructure for Risk Management 3.4: User s Guide

Reducing SAS Dataset Merges with Data Driven Formats

Handling Numeric Representation SAS Errors Caused by Simple Floating-Point Arithmetic Computation Fuad J. Foty, U.S. Census Bureau, Washington, DC

Paper # Jazz it up a Little with Formats. Brian Bee, The Knowledge Warehouse Ltd

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX

Transcription:

PROC FORMAT: USE OF THE CNTLIN OPTION FOR EFFICIENT PROGRAMMING Karuna Nerurkar and Andrea Robertson, GMIS Inc. ABSTRACT Proc Format can be a useful tool for improving programming efficiency. This paper will demonstrate how to use Proc Format with the Cntlin option to read in an extemal dataset and convert it into a custom format. The paper will discuss pitfalls to avoid when using the Proc Format such as using a numeric variable containing a decimal as your LABEL. The paper will also demonstrate the use of Proc Format as a more efficient mechanism than sorting and merging when performing a table lookup (i.e. using a key variable to extract observations from one dataset based on the values of the key variable in a second dataset). INTRODUCTION The Format procedure is a useful tool when the selection of records from one dataset is dependent on information contained within a second dataset. This is commonly called a table lookup. One requirement for successfully performing a table lookup is that both datasets contain one common element or key. First, this paper will provide a detailed explanation of how to perform a table lookup using the Format procedure. Then, it will compare the Format procedure table lookup to a SortlMerge table lookup to illustrate the gain in efficiency that the Format procedure achieves over the Sort/Merge process when a table lookup is being performed on large datasets. The paper will also provide tips on avoiding pitfalls when creating the format, and discuss memory limitations associated with the Format procedure. THE DATA The monthly processing of health insurance claims data will be used as an example in which a table lookup can be done using the Format procedure. In this situation, a history file containing claims data may consist of millions of claims for millions of members. Updating all of the claims in this file on a monthly basis, which would consist of combining newly entered claims with those in the history file, cleansing the data, and performing any desired enhancements, would be costly in terms of CPU and execution time. Since the new month of claims data will only contain records for a subset of the members in the history file, there is no need to reprocess the entire history file. Only the history records for those members who have services reflected in the new month of claims data will need to be selected for reprocessing. This is where the Format procedure can be a useful tool. USING THE FORMAT PROCEDURE TO PERFORM A TABLE LOOKUP In order to create a custom format beginning with a SAS dataset (the input control dataset), the Format procedure expects to find three variables within the dataset:. START - Range starting variable 2. LABEL - Label: informatted or formatted value 3. FMTNAME - Format or informat name The variable TYPE is not required, but if it is not included, the name of a character format must be preceded by a dollar sign ($). An additional variable, END, is also optional. START and END specify a range of values that will be assigned a given LABEL value. If END is absent in the input control dataset, SAS will assign it the same value as START (in essence, creating a range containing only one value). FMTNAME is the name of the format that has been created. The format name, when used (as described below) in conjunction with a key variable in your lookup dataset, executes a comparison of the values for that key variable with the values of the START variable (or in the START-END range) contained in the format. When a match between the value of the key variable and START is found, the Data step retums the value for LABEL that is paired with the value for START. There are other variables that can be specified in the input control dataset, but are not required. Refer to the SAS Procedures Guide for a listing of these variables. Using the health insurance claims data as an example, the information contained in the history file for those members that have received services this month needs to be updated. A key that uniquely identifies members must exist in both datasets. Both the history file (the master file) and the current month file (the transaction file) contain, along with other variables, each member's social security number () which will be used as the key. Since records 698

from the history file that match members represented in the current month file must be selected, the key variable () will be used to create the format. Since the format will serve a / function (select the record or -do not select the record), the same value will be assigned to LABEL for all values of the START variable. Additionally, since a single format is being created, the same FMTNAME will be assigned to all observations. For the sake of the example, a small test dataset will be used. IN.HIS94 (history file): CHARGES 9344 45.76 45.76 300.00 285.00 75.00 5.50 243.30 25.5 87.00 84.75 233675,5.40 780.75 IN2.CURR0794 (current month file): CHARGES 25.00 25.75 2.75 76.34 20.85 00.50 2.75 72.45 The first step is to create a dataset containing the variables START, LABEL, and FMTNAME. This can be done using the following SAS statements: Example Step : DATATEST; RETAIN LABEL '' FMTNAME ''; SET IN2.CURR0794 (KEEP= RENAME=(=START»; The above example creates a dataset named TEST containing the following observations: START LABEL FMTNAME The next step would be to execute the Format procedure which converts this dataset into a format; however, if this dataset were used in its current form, the Format procedure would fail because overlapping ranges exist. Each value (range) of the START variable must be unique in order for the Format procedure to successfully create a format for that variable. One way to avoid overlapping ranges is to sort the dataset TEST by the key variable using the dupkey option so that only one observation per member remains in the final dataset. The sort is illustrated below: Example Step 2: PROC SORT DATA=TEST OUT=TEST2 NODUPKEY; BY START; This code creates a dataset called TEST2 containing the following observations: START LABEL FMTNAME Proc Format with the Cntlin option is used to create the format. If you are creating a temporary format (i.e., one that will exist only while the program is running), use the following SAS statement: Example Step 3A: PROC FORMAT CNTLlN= TEST2; This will read in the dataset TEST2 (the input control dataset) and create the format on the WORK disk. If you want the format to be available for use beyond the completion of the current job, you must create a permanent format library to house the format: Example Step 3B: LlBNAME MYLIB 'MYACCT.MEMB.NEWMON' DISP=(NEW,CATLG) UNIT=DASD RETPD=365 SPACE=(TRK,(0,5),RLSE); PROC FORMAT CNTLlN= TEST2 LlBRARY=MYLIB; te that the above example was structured for use in MVS. The library allocation statement will need to 699

be modified to conform to the operating system that is being used. This example reads the dataset TEST2 and creates the format as a member of the permanent format library called 'MYACCT.MEMB.NEWMON'. The next step in performing the table lookup would be to read in the history file using the format that was created to select only the desired records. This can be accomplished in several ways, one of which will be illustrated below: Example 2: DATA OUT.HIS94NEW; SET IN.HIS94 (WHERE=(PUT(,.)=''» IN2.CURR0794; [statements] This data step evaluates each record in IN.HIS94 before reading it into the output dataset to determine whether the WHERE clause is satisfied by the record. Additionally, all records in IN2.CURR0794 are read into the data step since they will be added to the history file. When a match is found between the value of in the history file and a value for the START variable in the format (which represents s contained in the current month of data), the value of the LABEL (in this case '') is retumed, which satisfies the WHERE clause. When a match between the value of and any value for the START variable is not found, the first n bytes of the value for are retumed, with n being equal to the length of the LABEL variable. Each record in IN.HIS94 is evaluated for selection as shown below: 9344 233675 Retumed Value 93 24 24 2336 Select (YIN)? This completes the steps necessary for performing a table lookup using the Format procedure. te that IN2.CURR0794 need not be included in the SET statement in order for a table lookup to be performed. When the history dataset and the current month dataset are combined as shown in Example 2, the following records will be output to OUT.HIS94NEW: CHARGE 300.00 87.00 25.00 25.75 2.75 76.34 285.00 84.75 20.85 00.50 2.75 72.45 COMPARISON OF FORMAT WITH SORTIMERGE: Thus far, this paper has discussed how to use the. Format procedure to perform a table lookup. A table lookup can also be accomplished by sorting and merging the two datasets. Consider the same example of updating the history file with data from a current month file. If the two files have a common key then a table lookup can be perfomed. For the monthly and history files, this key is. In order to perform a merge, both files must be pre-sorted by. The first step is to sort the monthly file by the key and remove observations with duplicate key values. The dupkey option must be used in order to avoid the duplication of records that can occur in a many to many merge. Example 3 Step: PROC SORT DATA=IN2.CURR0794(KEEP=) OUT =UNIQUE NODUPKEY; BY; If the monthly file is sorted this way, the following dataset is created: te that the dataset UNIQUE should have only the key variable. This avoids overwriting values of 700

similarly named variables when this dataset is merged with the history file. The next step will be to sort the history file by the same key variable (). Keep all the records in the history file. Do not drop any variables. Example 3 Step 2: PROC SORT DATA=IN.HIS94 OUT = TEMP.HISA; BY; When IN.HIS94 is sorted, the following dataset is created: 233675 9344 CHARGES 300.00 75.00 243.30,5.40 87.00 45.76 285.00 5.50 25.5 780.75 84.75 45.76 To select records from the history file for members who also appear in the curent monthly file, merge the history file with the UNIQUE file that was created using the monthly file. Example 3 Step 3: DATA TEMP2.HISB; MERGE TEMP.HISA(IN=A) UNIQUE(IN=B); BY; IFAANDB; The statement 'IF A AND B' instructs SAS to output only those records where the value appears in both of the input datasets, thus accomplishing the table lookup. The resulting dataset will be as follows: CHARGES 300.00 87.00 285.00 84.75 At this point, a dataset containing the history file records for members who also have records in the current monthly file has been created. The next step is to combine this data with the current month file so update processing can be accomplished. Example 3 Step 4: DATA OUT.HIS94NEW; SET TEMP2.HISB IN2.CURR0794; For the format procedure, the corresponding four steps will be as follows:. Create dataset UNIQUE the same way as in the SortlMerge process. 2. Create input control dataset POP: DATA POP; SET UNIQUE(RENAME=(=START»; RETAIN LABEL '*' TYPE 'C' FMTNAME 'P; 3. Use the input control dataset to create a format: PROC FORMAT CNTLlN=POP; 4. Use this format to extract records from the history file. At the same time, set it together with the monthly file: DATA OUT.HIS94NEW; SET IN.HIS94 (WHERE=(PUT(, $ F.)='*'» IN2.CURR0794; A comparison between the SortlMerge process and the Format procedure was performed under the following conditions: Operating system: MVS, SAS : Version 6.08 History file: LRECL(Record length) = 632 Number of observations = 344,768 Monthly file: LRECL(Record length) = 632 Number of observations = 5,584 Step SorVMerge Format (CPU seconds) (CPU seconds) 0.60 0.55 2 3.73 0.06 3 9.47.26 4 3.20 3.68 OVerall 27.08 5.68 The SorVMerge process required almost twice as much CPU time as compared with the Format procedure. Also a substantial amount of memory is needed to store the intermediate large datasets created by the SortlMerge process, TEMP.HISA and TEMP2.HISB. Nevertheless, if the transaction 70

and master files are comparable in size, more resources may be used by the Format procedure than by the Sort/Merge process. MEMORY LIMITATIONS Since formats reside in memory, memory limitations may be reached while creating a very large format. For example, if the transaction file has a large number of unique s (more than 00,000), an error message indicating insufficient memory size may result. In our research, we found that when the START variable had a record length of 6 bytes, the maximum number of records we could put in a format was 83,69 when MEMSIZE=6M. To overcome this memory limitation, either increase the memory size, or create multiple formats. Example 4: Increasing MEMSIZE: The memory size can be increased by coding the region parameter and the MEMSIZE option in your JCL. Memory size can also be increased by coding MEMSIZE in the program as illustrated below. OPTIONS MEMSIZE=6M; PROC SORT DATA=IN2.CURR0794(KEEP=) OUT =UNIOUE NODUPKEY; BY; DATA POP; SET UNIOUE(RENAME=(=START»; LABEL='*'; TYPE='C'; FMTNAME=' F'; PROC FORMAT CNTLlN=POP; Increase memory size as much as your system will allow. If memory is still inadequate, then create multiple formats as illustrated in the next example. Example 5: Creation of multiple control datasets: You can circumvent the memory requirement by creating multiple input control datasets to produce multiple smaller formats. te that this example is structured for use in an MVS environment. Example 5 Program : LBNAME USERFMT 'MYACCT.MEMB.NEWMON' DlSP=(NEW,CATLG) UNIT=DASD RETPD=365 SPACE=(TRK,(0,5),RLSE); OPTIONS MEMSIZE=6M; PROC SORT DATA=IN2.CURR0794(KEEP=) OUT =UNIQUE NODUPKEY; BY; DATA POP POP2; SET UNIOUE(RENAME=(=START»; RETAIN LABEL '.'; IF _N_ < 8369 THEN DO; FMTNAME='$ F'; OUTPUT POP; END; ELSE DO; FMTNAME ='$2F'; OUTPUT POP2; END; PROC FORMAT CNTLlN=POP LBRARY=USERFMT ; PROC FORMAT CNTLlN=POP2 LBRARY=USERFMT ; In the above example, the two formats $ F and $2F are created from dataset UNIQUE which contains 366,338 observations. Another program or another datastep in the same program can then access the permanent library 'USERFMT' and use the formats to extract information from the claims file as follows. Example 5 Program 2: In this example, both of the formats should be used as criteria to extract records corresponding to patients appearing in the dataset IN.HIS94. This is done using the PUT statement as follows: OPTIONS MEMSIZE=6M FMTSEARCH=(USERFMT ); DATA OUT.HIS94NEW; SET IN.HIS94 (WHERE=(PUT(,$F.) =,*, OR PUT(,$2F.) = '*'»; te that if your transaction file has a large number of unique s and creating two input control datasels is not enough to get around memory limitations, you can create as many input control 702

datasets and formats as you like. However, since you can load only two formats in one data step, you will have to make multiple data steps to load these formats. Even though more CPU time is needed for multiple datasteps, the Format procedure is still less CPU intensive than the SortlMerge. POTENTIAL PITFALLS When selecting a value for your LABEL, keep in mind the possible values of your comparison variable (in this case ). If, in Example above, we had assigned LABEL='' when creating the format, the following would be the outcome of evaluating the records in IN.HIS94: 9344 233675 Retumed Value 9 2 Select (YIN)? Since the LABEL is only one byte in length, in the three cases where a match is not found between and START (, 233675, and 9344), the first byte of the is returned (, 2, and 9 respectively). In the first case (), this value matches the value of LABEL, so the two records are erroneously selected. One way to ensure that the value of your LABEL will not cause records to be selected erroneously is to create an 'other' condition. To do this, you will need to make use of the DO WHILE loop as the following example illustrates: Example 6: PROC SORT DATA=IN2.CURR0794 OUT = TEST NODUPKEY; BY; DATATEST2; RETAIN FMTNAME ''; DO WHILE (NOT EOF); SETTEST (KEEP= RENAME=(=START» END=EOF; LABEL=''; OUTPUT; END; START ='OTHER'; LABEL='BAD'; OUTPUT; STOP; PROC FORMAT CNTLlN= TEST2; DATA OUT.HIS94NEW; SET IN.HIS94 (WHERE=(PUT(,.)=''» IN2.CURR0794; In the data step where TEST2 is created, once the 'not end of file' condition is no longer satisfied (i.e., the end of file has been reached), the Do While loop will terminate. The next few lines create an additional record with 'OTHER' as the START value. All records with a value for the key variable (in this case ) that is not in the list of START values will fall into the 'other' condition, so the value of LABEL that is paired with the 'other' condition will be returned (in this case 'BAD'). Repeating the prior example, the following would be the outcome of evaluating the records in IN.HIS94 when the 'other' condition is included: Returned Value Select (YIN)? 9344 BAD BAD BAD 233675 BAD tice the last statement in Example 6 above. The Stop statement causes the data step to stop processing after the 'other' condition record has been output. If the Stop statement is not used, a second 'other' record will be generated and the following note will appear in your log: NOTE: DATA STEP stopped due to looping. One other pitfall involves the assignment of the LABEL. You may create a format for reasons other than a table lookup, such as in the assignment of an adjustment factor in order to normalize dollar values across localities within a region. In this instance, it would make sense to have the START variable contain zip codes and the LABEL variable contain the associated adjustment factor. However, if you 703

assigned a numeric LABEL value (such as.25), only digits to the left of the decimal will be retumed when the format is executed. This occurs because SAS truncates all numeric LABEL values to integers. To get around this, assign the LABEL value as a character. After you execute the format to assign the adjustment factor to a new variable (ADJUST), you can convert this variable back to a numeric for computing purposes. CONCLUSION This paper discussed the use of the Format procedure with the Cntlin option when performing a table lookup, and compared the CPU time of the Format procedure to the Sort/Merge process. If the master file is substantially larger than the transaction file, then it is advantageous to use formats for the update process to save CPU time. Increase memory size as much as your system will allow in order to make larger formats. If memory is still inadequate, create multiple formats. However, if the transaction and master files are comparable in size, then you might be using more resources in the Format procedure than you would to perform the Sort/Merge process. The paper also discussed potential pitfalls to avoid when assigning the LABEL variable. REFERENCES SAS Institute Inc., SA~ Procedures Guide, Version 6, Third Edition, Cary, N.C. SAS Institute Inc., 990. pp 300-302. ACKNOWLEDGEMENTS SAS is a registered trademark of SAS Institute Inc. in the USA and other countries. indicates USA registration. 704