USING SAS HASH OBJECTS TO CUT DOWN PROCESSING TIME Girish Narayandas, Optum, Eden Prairie, MN

Similar documents
Programming Beyond the Basics. Find() the power of Hash - How, Why and When to use the SAS Hash Object John Blackwell

An Introduction to SAS Hash Programming Techniques Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California

Hash Objects for Everyone

A Hands-on Introduction to SAS DATA Step Hash Programming Techniques

Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA

Hash Objects Why Bother? Barb Crowther SAS Technical Training Specialist. Copyright 2008, SAS Institute Inc. All rights reserved.

Why Hash? Glen Becker, USAA

Hashtag #Efficiency! An Exploration of Hash Tables and Merge Techniques

Merging Data Eight Different Ways

9 Ways to Join Two Datasets David Franklin, Independent Consultant, New Hampshire, USA

Simple Rules to Remember When Working with Indexes

Not Just Merge - Complex Derivation Made Easy by Hash Object

Comparison of different ways using table lookups on huge tables

Exploring DATA Step Merge and PROC SQL Join Techniques Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California

Basic SAS Hash Programming Techniques Applied in Our Daily Work in Clinical Trials Data Analysis

HASH Beyond Lookups Your Code Will Never Be the Same!

Are you Still Afraid of Using Arrays? Let s Explore their Advantages

How to Create Data-Driven Lists

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

Amie Bissonett, inventiv Health Clinical, Minneapolis, MN

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC

Table Lookups: From IF-THEN to Key-Indexing

PROC SQL vs. DATA Step Processing. T Winand, Customer Success Technical Team

Countdown of the Top 10 Ways to Merge Data David Franklin, Independent Consultant, Litchfield, NH

Paper Haven't I Seen You Before? An Application of DATA Step HASH for Efficient Complex Event Associations. John Schmitz, Luminare Data LLC

Paper ###-YYYY. SAS Enterprise Guide: A Revolutionary Tool! Jennifer First, Systems Seminar Consultants, Madison, WI

Streamline Table Lookup by Embedding HASH in FCMP Qing Liu, Eli Lilly & Company, Shanghai, China

Introduction / Overview

Taming a Spreadsheet Importation Monster

Paper TT17 An Animated Guide : Speed Merges with Key Merging and the _IORC_ Variable Russ Lavery Contractor for Numeric resources, Inc.

Can you decipher the code? If you can, maybe you can break it. Jay Iyengar, Data Systems Consultants LLC, Oak Brook, IL

Using SAS Enterprise Guide to Coax Your Excel Data In To SAS

Efficient Processing of Long Lists of Variable Names

Demystifying PROC SQL Join Algorithms

Checking for Duplicates Wendi L. Wright

A Side of Hash for You To Dig Into

Andrew H. Karp Sierra Information Services, Inc. San Francisco, California USA

Advanced Programming Techniques Using the DS2 Procedure

Overview of HASH Objects Swarnalatha Gaddam, Cytel Inc. Hyderabad, India

Be Your Own Task Master - Adding Custom Tasks to EG Peter Eberhardt, Fernwood Consulting Group Inc. Toronto, ON

ABSTRACT MORE THAN SYNTAX ORGANIZE YOUR WORK THE SAS ENTERPRISE GUIDE PROJECT. Paper 50-30

Getting the Most from Hash Objects. Bharath Gowda

Efficient Elimination of Duplicate Data Using the MODIFY Statement

Implementing external file processing with no record delimiter via a metadata-driven approach

Table Lookup Techniques: From the Basics to the Innovative

USING HASH TABLES FOR AE SEARCH STRATEGIES Vinodita Bongarala, Liz Thomas Seattle Genetics, Inc., Bothell, WA

ABC Macro and Performance Chart with Benchmarks Annotation

Customized Flowcharts Using SAS Annotation Abhinav Srivastva, PaxVax Inc., Redwood City, CA

Please Don't Lag Behind LAG!

Extending the Scope of Custom Transformations

Let Hash SUMINC Count For You Joseph Hinson, Accenture Life Sciences, Berwyn, PA, USA

AVL 4 4 PDV DECLARE 7 _NEW_

Better Metadata Through SAS II: %SYSFUNC, PROC DATASETS, and Dictionary Tables

Hypothesis Testing: An SQL Analogy

PROC SQL VS. DATA STEP PROCESSING JEFF SIMPSON SAS CUSTOMER LOYALTY

Tackling Unique Problems Using TWO SET Statements in ONE DATA Step. Ben Cochran, The Bedford Group, Raleigh, NC

BAKING OR COOKING LEARN THE DIFFERENCE BETWEEN SAS ARRAYS AND HASH OBJECTS

Exploring DICTIONARY Tables and SASHELP Views

Macro Basics. Introduction. Defining and Using Macro Variables. Defining and Using Macros. Macro Parameters. Part 1. Chapter 1. Chapter 2.

How Managers and Executives Can Leverage SAS Enterprise Guide

Handling Numeric Representation SAS Errors Caused by Simple Floating-Point Arithmetic Computation Fuad J. Foty, U.S. Census Bureau, Washington, DC

Please don't Merge without By!!

SEER AKADEMI LINUX PROGRAMMING AND SCRIPTINGPERL 7

Connect with SAS Professionals Around the World with LinkedIn and sascommunity.org

Understanding and Applying the Logic of the DOW-Loop

Algorithms and Data Structures

Practical Uses of the DOW Loop Richard Read Allen, Peak Statistical Services, Evergreen, CO

A Visual Step-by-step Approach to Converting an RTF File to an Excel File

1. Join with PROC SQL a left join that will retain target records having no lookup match. 2. Data Step Merge of the target and lookup files.

PhUse Practical Uses of the DOW Loop in Pharmaceutical Programming Richard Read Allen, Peak Statistical Services, Evergreen, CO, USA

KEYWORDS ARRAY statement, DO loop, temporary arrays, MERGE statement, Hash Objects, Big Data, Brute force Techniques, PROC PHREG

A Breeze through SAS options to Enter a Zero-filled row Kajal Tahiliani, ICON Clinical Research, Warrington, PA

Ten Great Reasons to Learn SAS Software's SQL Procedure

JMP Coders 101 Insights

An Introduction to PROC REPORT

An Automation Procedure for Oracle Data Extraction and Insertion

Essentials of PDV: Directing the Aim to Understanding the DATA Step! Arthur Xuejun Li, City of Hope National Medical Center, Duarte, CA

Using SAS 9.4M5 and the Varchar Data Type to Manage Text Strings Exceeding 32kb

Format-o-matic: Using Formats To Merge Data From Multiple Sources

SWIFT - CLOSURES. Global Functions Nested Functions Closure Expressions. Have a name. Capture values from enclosing function

ECE 2035 Programming HW/SW Systems Fall problems, 5 pages Exam Three 28 November 2012

SAS Data Libraries. Definition CHAPTER 26

Using Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY

DATA Step in SAS Viya : Essential New Features

Chapter 13: Query Processing

Paper BB-59 PROC TRANSPOSE For Fun And Profit. John J. Cohen, Advanced Data Concepts, LLC

SAS Studio: A New Way to Program in SAS

Omitting Records with Invalid Default Values

The Basics of PROC FCMP. Dachao Liu Northwestern Universtiy Chicago

Join, Merge or Lookup? Expanding your toolkit

One-Step Change from Baseline Calculations

Chapter 12: Query Processing

High Institute of Computer Science & Information Technology Term : 1 st. El-Shorouk Academy Acad. Year : 2013 / Year : 2 nd

capabilities and their overheads are therefore different.

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys

Exploring HASH Tables vs. SORT/DATA Step vs. PROC SQL

Query Processing. Debapriyo Majumdar Indian Sta4s4cal Ins4tute Kolkata DBMS PGDBA 2016

KEYWORDS Metadata, macro language, CALL EXECUTE, %NRSTR, %TSLIT

Customizing SAS Data Integration Studio to Generate CDISC Compliant SDTM 3.1 Domains

! A relational algebra expression may have many equivalent. ! Cost is generally measured as total elapsed time for

Transcription:

Paper RF-12-2014 USING SAS HASH OBJECTS TO CUT DOWN PROCESSING TIME Girish Narayandas, Optum, Eden Prairie, MN ABSTRACT Hash tables are in existence since SAS 9 version and are part of data step programming. The hash application and syntax style is rather different from the regular SAS data step programming and often programmers shy away from using this great feature. The hash object is a great option for lookups when using unsorted data that can fit into memory. Hash object provides an efficient and convenient mechanism for quick data storage and retrieval. SAS HASH use can achieve great I/O efficiencies and enormous time savings when merging tables with the DATA Step. INTRODUCTION Hash Table is a lookup table that is designed to efficiently store non-contiguous keys (ids, account numbers, codes, etc.) that may have wide gaps in their alphabetic and numeric sequences. SAS provides two pre-defined component objects for use in a DATA step: the hash object and the hash iterator object. These objects basically enable you to rapidly store, search, and retrieve data based on lookup keys efficiently there by cutting down the run time. The DATA step component object interface enables you to create and manipulate these component objects by using statements, attributes, and methods. You use the DATA step object dot notation to access the component object's attributes and methods The component object is established by the DECLARE statement in the SAS DATA step. DECLARE HASH PROV (DATASET: LIB.PROV, HASHEXP:20); The name of the object is followed by DATASET: and optional constructor statements such as ORDERED, HASHEXP. Ordered determines how the key variables are to be ordered in the hash table. Before I attempt to explain HASHEXP attribute, let s understand how SAS HASH processing occurs in very simple terms: When the object is declared, it creates a key per value Hash keys are evenly distributed into hash slots so that each slot contains as few keyvalue pairs as possible When an item is looked up, its key is hashed to find the appropriate slot. Then, the slot is searched for the right key-value pair HASHEXP is an indicator of how to allocate memory and distribute the hash items in memory. Simple rule is that, the larger the hash table, the larger the value HASHEXP should be assigned. The maximum size for HASHEXP is 20 (Eberhardt et al 2010). Finding out an efficient value for HASHEXP: is dependent on multiple factors. This is an exponent so a value of 10 results in

2 10 =1,024 buckets or slots. The default size is 8 (2 8 =256). For a million items HASHEXP = 9 or 10; 512 to 1024 slots) should provide good performance (Carpenter et al 2012 & SAS Documentation). For a more thorough explanation of HASHEXP and the hash memory model see Dorfman et al 2008, Eberhardt et al 2010 and Carpenter et al 2012. This paper is not intended to be complete SAS HASH reference guide, nor is an attempt to thoroughly explain the deeper mechanism of how SAS HASH tables work. The intention is provide a quick starter guide with some basic examples in an attempt to attract SAS beginners and SAS users who haven t had a chance to use HASH tables to start exploring SAS HASH tables. SYNTAX AND EXAMPLES Hash tables are objects, so you call methods to use them. The syntax of calling a method is to put the name of the hash table, then a dot (.), then the method you want to use. Following the method is a set of parentheses in which you put the specifications for the method. You will see this kind of syntax in the statement below: EXAMPLE 1: Joining two datasets with a simple key Members Table Member_ID Prov_ID GRP_ID GRP_CATEGORY MBR0001 PR001 GRP01 GCAT1 MBR0002 PR002 GRP01 GCAT3 MBR0003 PR003 GRP03 GCAT2 MBR0004 PR001 GRP02 GCAT2 MBR0005 PR002 GRP01 GCAT3 MBR0006 PR004 GRP02 GCAT1 MBR0007 PR002 GRP01 GCAT2 Provider Table Prov_ID Prov_Name Prov_State PR001 Scarborough Clinic WA PR002 Atlantic Health ME PR003 Fairway Clinics NH PR004 Camden Clinic VA DATA MBR_PROV; Here is a simple example where we are merging a member and a provider table using a prov_id key and bring in prov_name and prov_state fields into the MBR_PROV dataset that is being created. To join table using hash objects there are FOUR basic steps to follow:

STEP 1: The DATA, SET and RUN statements are part of basic data step programming components in SAS that any SAS programmer is familiar with. We are merging LIB.MBR and LIB.PROV and creating a MBR_PROV dataset. LIB.MBR is being used as the base dataset because we want all the fields from the base dataset (usually the larger dataset of the two or more datasets that are used). Since hash tables use memory, smaller dataset is used to build hash table to avoid running into possible memory issues. DATA MBR_PROV; STEP 2: HASH processing needs to know about the data fields and their properties in a hash table before they can be used. Data field properties could be defined using a length and format statements, however a more efficient way would be use a SET statement as shown below. The SET statement will be read at compilation, data variable properties in the datasets will be extracted, but the statement will never actually execute due to the obviously false statement if 1=2 condition. DATA MBR_PROV; STEP 3: In this step, the DECLARE statement is used to name and instantiate the object, i.e. we are defining a HASH named PROV and stating what dataset need to be loaded into the memory. PROV.DEFINEKEY ( PROV_ID ) defines the unique key variable(s) that will be used to join with other tables. PROV.DEFINEDATA ( PROV_NAME, PROV_STATE ) denotes the data fields we want to have available for use in memory. PROV.DEFINDONE is part of the HASH syntax that closes the hash argument.

DATA MBR_PROV; STEP 4: We are almost there, This is the final step where the DATA step is reading observations from the LIB.MBR table and using key fields specified, PROV_ID to join to the has table. Now, the SAS process attempts to find matching pairs between MBR and PROV tables using the data field specified in the DEFINEKEY; PROV_ID. The IF condition in this statement will result in an INNER JOIN. Therefore, only matching records between the two tables will be written to the output dataset. DATA MBR_PROV; Resulting dataset: Member_ID Prov_ID GRP_ID GRP_CATEGORY Prov_Name Prov_State MBR0001 PR001 GRP01 GCAT1 Scarborough Clinic WA MBR0002 PR002 GRP01 GCAT3 Atlantic Health ME MBR0003 PR003 GRP03 GCAT2 Fairway Clinics NH MBR0004 PR001 GRP02 GCAT2 Scarborough Clinic WA MBR0005 PR002 GRP01 GCAT3 Atlantic Health ME MBR0006 PR004 GRP02 GCAT1 Camden Clinic VA MBR0007 PR002 GRP01 GCAT2 Atlantic Health ME EXAMPLE 2: Joining THREE datasets with composite keys: The second example is really an enhancement to the first example that illustrates joining three datasets and using composite keys (more than one key field). It involves adding a second key field to the GRP table. In the first example we needed to pick up PROV_NAME and PROV_STATE

based on the value of PROV_ID. Depending on table size and number of data fields used, the SORT procedure can be quite expensive in terms of I/O efficiencies plus adding to the overall jobs run time. Hash tables allow us to avoid sorting the two or potentially more dataset which can eat up resources with the real time large datasets. In this example, we are adding a Group (GRP) table and using two keys GRP_CATEGORY and GRP_ID and bring in GRP_NAME in to the output table, MBR_PROV_GRP. Members Table Member_ID Prov_ID GRP_ID GRP_CATEGORY MBR0001 PR001 GRP01 GCAT1 MBR0002 PR002 GRP01 GCAT3 MBR0003 PR003 GRP03 GCAT2 MBR0004 PR001 GRP02 GCAT2 MBR0005 PR002 GRP01 GCAT3 MBR0006 PR004 GRP02 GCAT1 MBR0007 PR002 GRP01 GCAT2 Group Table GRP_CATEGORY GRP_ID GRP_NAME GCAT1 GRP01 NORTH WEST AAA GCAT1 GRP02 NORTH WEST BBB GCAT2 GRP01 SOUTH WEST AAA GCAT2 GRP02 SOUTH WEST BBB GCAT3 GRP01 EASTERN AAA GCAT3 GRP02 EASTERN BBB GCAT3 GRP03 EASTERN CCC Provider Table Prov_ID Prov_Name Prov_State PR001 Scarborough Clinic WA PR002 Atlantic Health ME PR003 Fairway Clinics NH PR004 Camden Clinic VA DATA MBR_PROV_GRP; IF 1=2 THEN SET SASUSER.MBR SASUSER.PROV SASUSER.GRP; DECLARE HASH PROV(DATASET: 'SASUSER.PROV'); PROV.DEFINEKEY ('PROV_ID'); PROV.DEFINEDATA ('PROV_NAME', 'PROV_STATE'); PROV.DEFINEDONE (); DECLARE HASH GRP(DATASET: 'SASUSER.GRP'); GRP.DEFINEKEY ('GRP_CATEGORY', 'GRP_ID'); GRP.DEFINEDATA ('GRP_NAME'); GRP.DEFINEDONE (); SET SASUSER.MBR; IF (PROV.FIND(KEY: PROV_ID) = 0 ) AND (GRP.FIND (KEY:GRP_CATEGORY, KEY:GRP_ID )= 0) THEN OUTPUT;

Resulting dataset: Member_ID Prov_ID GRP_ID GRP_CATEGORY Prov_Name Prov_State GRP_NAME MBR0001 PR001 GRP01 GCAT1 Scarborough Clinic WA NORTH WEST AAA MBR0002 PR002 GRP01 GCAT3 Atlantic Health ME EASTERN AAA MBR0004 PR001 GRP02 GCAT2 Scarborough Clinic WA SOUTH WEST BBB MBR0005 PR002 GRP01 GCAT3 Atlantic Health ME EASTERN AAA MBR0006 PR004 GRP02 GCAT1 Camden Clinic VA NORTH WEST BBB MBR0007 PR002 GRP01 GCAT2 Atlantic Health ME SOUTH WEST AAA FURTHER READING This is just the beginning of HASH tables; the goal of the paper was to introduce SAS HASH to the beginners as well as experienced SAS programmers who haven t had the chance to explore into SAS HASH. The paper provides couple of basic examples that SAS users can take and start using them in their settings and take advantage of its efficiency and other advantages. SORT Data using HASH: The hash object is instantiated in the (example 3, marked 1 in the orange box) with the HASH name: GSORT. The hash object is defined to contain the table SASUSER.PROV and will be ordered using the key variable(s) in ascending order. The use of DATASET: constructor method should suffice to load the entire table into the hash object. The ORDERED constructor is an optional statement that can be used to specify the order of sorting. Values ( a, ascending, yes, y ) for ascending order and ( descending, d, n, no ) for descending order. DECLARE HASH GSORT(DATASET: 'SASUSER.GRP', ORDERED:'A'); OTHER METHODS: ADD, DELETE, REMOVE, REPLACE, OUTPUT DEFINEKEY, DEINE DATA, DEFINEDONE and FIND methods are the most popular HASH methods, however there are additional methods available in the SAS HASH objects that are worth mentioning such as ADD, DELETE, REMOVE, REPLACE. The ADD method is useful when additional key and data values need to be added on the fly to the hash object. DELETE method simply deletes a hash table from the memory. REMOVE allows to remove a data line associated with a define key mentioned in the hash table. REPLACE helps to replace the data values associated with a particular define key stated in the hash. OUTPUT method helps to write the HASH table to a reusable SAS dataset and to the specified library. Below code example shows the usage of the above discussed HASH methods. They are shown in the example 3 (block marked 2 in the orange box). BEFORE and AFTER data is also shown after the code section to illustrate the additional HASH methods.

EXAMPLE 3: SORTING and ADDITIONAL HASH METHODS DATA _NULL_; IF 1=2 THEN SET SASUSER.PROV SASUSER.GRP; DECLARE HASH GSORT(DATASET: 'SASUSER.GRP', ORDERED:'A'); GSORT.DEFINEKEY ('GRP_NAME'); GSORT.DEFINEDATA ('GRP_NAME','GRP_CATEGORY', 'GRP_ID'); GSORT.DEFINEDONE (); GSORT.OUTPUT (DATASET:'GRP_SORTED'); 1 DECLARE HASH PSORT(DATASET: 'SASUSER.PROV', ORDERED:'D'); PSORT.DEFINEKEY ('PROV_STATE'); PSORT.DEFINEDATA ('PROV_STATE','PROV_NAME', 'PROV_ID'); PSORT.DEFINEDONE (); PSORT.ADD (KEY: 'AL',DATA:'AL',DATA:'BUSCH HEALTH', DATA:'PR005'); PSORT.REMOVE (KEY:'WA'); PSORT.REPLACE (KEY: 'NH', DATA:'NY',DATA:'FAIRWAY CLINICS',DATA:'PR003'); PSORT.OUTPUT (DATASET:'PROV_SORTED'); PSORT.DELETE(); 2 SET SASUSER.PROV END=EOF SASUSER.GRP; SORTING DATA EXAMPLE BEFORE GRP_CATEGORY GRP_ID GRP_NAME GCAT1 GRP01 NORTH WEST AAA GCAT1 GRP02 NORTH WEST BBB GCAT2 GRP01 SOUTH WEST AAA GCAT2 GRP02 SOUTH WEST BBB GCAT3 GRP01 EASTERN AAA GCAT3 GRP02 EASTERN BBB GCAT3 GRP03 EASTERN CCC AFTER GRP_NAME GRP_CATEGORY GRP_ID EASTERN AAA GCAT3 GRP01 EASTERN BBB GCAT3 GRP02 EASTERN CCC GCAT3 GRP03 NORTH WEST AAA GCAT1 GRP01 NORTH WEST BBB GCAT1 GRP02 SOUTH WEST AAA GCAT2 GRP01 SOUTH WEST BBB GCAT2 GRP02 ADD, DELETE, REMOVE & REPLACE BEFORE Prov_ID Prov_Name Prov_State PR001 Scarborough Clinic WA PR002 Atlantic Health ME PR003 Fairway Clinics NH PR004 Camden Clinic VA AFTER Prov_State Prov_Name Prov_ID VA Camden Clinic PR004 NY FAIRWAY CLINICS PR003 ME Atlantic Health PR002 AL BUSCH HEALTH PR005

One of the errors we may come across is a duplicate key value. By default the SAS hash object keeps the first value it encounters by default. In certain situations, if one wants to keep the last value of a key REPLACE method: could be used. Prior to SAS 9.2, there was no way to have multiple items with the same key; we will see an example later on how to maintain multiple items with the same key. Logically one would expect a lookup table to have unique key values; however, since the hash object can be used for more than a simple lookup table it needs a mechanism to deal with duplicates. If you prefer to load your hash object by specifying a DATASET: tag, you can also specify if you want the first or last occurrence kept. By default we get the first occurrence. If we want the last occurrence we can use the DUPLICATE: tag: DECLARE HASH GSORT(DATASET: 'SASUSER.GRP', DUPLICATE:'REPLACE'); ADVANTAGES AND LIMITATIONS Advantages: a. An alternative to any PROC SQL JOIN or DATA step MERGE that runs faster especially when there is very large dataset and a small dataset involved b. With HASH, there is no need for preparatory sorting or indexing before joining the table c. Handles joining multiple tables all in one step even though when they use different key fields d. Can bring in multiple values from just one lookup operation e. Guess what, lookup values during hash session can be either numeric or character fields Limitations: a. The amount of data that can be loaded into a hash object is the amount of memory available to the SAS session. So larger the lookup table, additional memory may be required. b. SAS HASH syntax and terminology is different from the conventional SAS syntax which is more intuitive, hence can take some getting used to, before one feels comfortable using HASH. CONCLUSION: SAS HASH is an underutilized memory based powerful programming tool in the of SAS tool box collection. This paper introduced the basic elements of SAS HASH; what a hash table means, general syntax and functioning, advantages of using HASH, and a few simple examples. Although the processing time for a SAS programming step might depend on several factors, in my experience, I was able to cut down processing time from hours to minutes using SAS HASH. I hope SAS users across various industries take full advantage of this programming technique to cut down on the processing time for various common coding operations that they handle in SAS such as sorting, table lookup operations, merges etc.

TRADEMARK CITATIONS SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies REFERENCES Blackwell, John (2010). Find() the power of Hash - How, Why and When to use the SAS Hash Object Carpenter, Art (2012). Carpenter s Guide to Innovative SAS Techniques textbook (Chapter 3.3, pages 117-130) Dorfman, Paul (2005). "Data Step Hash Objects as Programming Tools," Proceedings of the SAS Global Forum 2005 (SUGI 30) Conference. Lafler, Kirk Paul (2011). An Introduction to SAS Hash Programming Techniques, Proceedings of the 2011 South East SAS Users Group (SESUG) Conference Loren, Judy (2006). "How Do I Love Hash Tables? Let Me Count The Ways!," Proceedings of the Nineteenth Northeast SAS Users Group Conference. Data Step Concepts: Using the HASH Object: SAS Documentation http://support.sas.com/documentation/cdl/en/lrcon/65287/html/default/viewer.htm#n1b4c btmb049xtn1vh9x4waiioz4.htm Warner-Freeman, Jennifer (2007). I cut my processing time by 90% using hash tables - You can do it too! North East SAS Users Group (NESUG 2007) AUTHOR INFORMATION Your comments and questions are valued and encouraged. Contact the author at Optum Inc 13625 Technology Drive, Eden Prairie, MN 55344 Phone: 952-974-1928 girish.narayandas@optum.com www.optum.com