Streamline Table Lookup by Embedding HASH in FCMP Qing Liu, Eli Lilly & Company, Shanghai, China

Similar documents
Not Just Merge - Complex Derivation Made Easy by Hash Object

ODS/RTF Pagination Revisit

My Reporting Requires a Full Staff Help!

Customized Flowcharts Using SAS Annotation Abhinav Srivastva, PaxVax Inc., Redwood City, CA

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

PharmaSUG Paper TT10 Creating a Customized Graph for Adverse Event Incidence and Duration Sanjiv Ramalingam, Octagon Research Solutions Inc.

A Side of Hash for You To Dig Into

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

Hash Objects Why Bother? Barb Crowther SAS Technical Training Specialist. Copyright 2008, SAS Institute Inc. All rights reserved.

Run your reports through that last loop to standardize the presentation attributes

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

Countdown of the Top 10 Ways to Merge Data David Franklin, Independent Consultant, Litchfield, NH

PharmaSUG China Paper 70

This paper describes a report layout for reporting adverse events by study consumption pattern and explains its programming aspects.

A Visual Step-by-step Approach to Converting an RTF File to an Excel File

Programming Beyond the Basics. Find() the power of Hash - How, Why and When to use the SAS Hash Object John Blackwell

Functions vs. Macros: A Comparison and Summary

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

USING SAS HASH OBJECTS TO CUT DOWN PROCESSING TIME Girish Narayandas, Optum, Eden Prairie, MN

1. Join with PROC SQL a left join that will retain target records having no lookup match. 2. Data Step Merge of the target and lookup files.

An Introduction to PROC REPORT

The Basics of PROC FCMP. Dachao Liu Northwestern Universtiy Chicago

Hash Objects for Everyone

Data Presentation ABSTRACT

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

Formatting Highly Detailed Reports: Eye-Friendly, Insight-Facilitating

Quick Results with the Output Delivery System

Paper Haven't I Seen You Before? An Application of DATA Step HASH for Efficient Complex Event Associations. John Schmitz, Luminare Data LLC

A Macro to Create Program Inventory for Analysis Data Reviewer s Guide Xianhua (Allen) Zeng, PAREXEL International, Shanghai, China

ODS for PRINT, REPORT and TABULATE

PharmaSUG China Paper 059

Are you Still Afraid of Using Arrays? Let s Explore their Advantages

Tweaking your tables: Suppressing superfluous subtotals in PROC TABULATE

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

Helping You C What You Can Do with SAS

PharmaSUG China. Systematically Reordering Axis Major Tick Values in SAS Graph Brian Shen, PPDI, ShangHai

Let Hash SUMINC Count For You Joseph Hinson, Accenture Life Sciences, Berwyn, PA, USA

The new SAS 9.2 FCMP Procedure, what functions are in your future? John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc.

9 Ways to Join Two Datasets David Franklin, Independent Consultant, New Hampshire, USA

Submitting SAS Code On The Side

PharmaSUG China. model to include all potential prognostic factors and exploratory variables, 2) select covariates which are significant at

Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA

Fly over, drill down, and explore

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

Merging Data Eight Different Ways

A Macro for Systematic Treatment of Special Values in Weight of Evidence Variable Transformation Chaoxian Cai, Automated Financial Systems, Exton, PA

ABSTRACT INTRODUCTION MACRO. Paper RF

PDF Multi-Level Bookmarks via SAS

Have SAS Annotate your Blank CRF for you! Plus dynamically add color and style to your annotations. Steven Black, Agility-Clinical Inc.

An Efficient Tool for Clinical Data Check

Text Wrapping with Indentation for RTF Reports

RUN_MACRO Run! With PROC FCMP and the RUN_MACRO Function from SAS 9.2, Your SAS Programs Are All Grown Up

SAS Display Manager Windows. For Windows

A Hands-on Introduction to SAS DATA Step Hash Programming Techniques

IT S THE LINES PER PAGE THAT COUNTS Jonathan Squire, C2RA, Cambridge, MA Johnny Tai, Comsys, Portage, MI

Paper SAS FCMP: A Powerful SAS Procedure You Should Be Using Bill McNeill, Andrew Henrick, Mike Whitcher, and Aaron Mays, SAS Institute Inc.

Taming a Spreadsheet Importation Monster

Journey to the center of the earth Deep understanding of SAS language processing mechanism Di Chen, SAS Beijing R&D, Beijing, China

limelightplatform.com

A Tool to Compare Different Data Transfers Jun Wang, FMD K&L, Inc., Nanjing, China

ODS TAGSETS - a Powerful Reporting Method

SAS Online Training: Course contents: Agenda:

Using a HASH Table to Reference Variables in an Array by Name. John Henry King, Hopper, Arkansas

Super boost data transpose puzzle

SAS (Statistical Analysis Software/System)

Hashtag #Efficiency! An Exploration of Hash Tables and Merge Techniques

Macro Quoting: Which Function Should We Use? Pengfei Guo, MSD R&D (China) Co., Ltd., Shanghai, China

SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada

Implementing external file processing with no record delimiter via a metadata-driven approach

KEYWORDS ARRAY statement, DO loop, temporary arrays, MERGE statement, Hash Objects, Big Data, Brute force Techniques, PROC PHREG

PharmaSUG Paper PO10

User-Written DATA Step Functions Jason Secosky, SAS Institute Inc., Cary, NC

Why Hash? Glen Becker, USAA

Producing Summary Tables in SAS Enterprise Guide

Handling Numeric Representation SAS Errors Caused by Simple Floating-Point Arithmetic Computation Fuad J. Foty, U.S. Census Bureau, Washington, DC

RWI not REI a Robust report writing tool for your toughest mountaineering challenges.

Using PROC REPORT to Cross-Tabulate Multiple Response Items Patrick Thornton, SRI International, Menlo Park, CA

Paper ###-YYYY. SAS Enterprise Guide: A Revolutionary Tool! Jennifer First, Systems Seminar Consultants, Madison, WI

The Power of Combining Data with the PROC SQL

SAS CLINICAL SYLLABUS. DURATION: - 60 Hours

Part 1. Introduction. Chapter 1 Why Use ODS? 3. Chapter 2 ODS Basics 13

Demystifying PROC SQL Join Algorithms

29 Shades of Missing

Uncommon Techniques for Common Variables

A SAS Macro to Generate Caterpillar Plots. Guochen Song, i3 Statprobe, Cary, NC

The REPORT Procedure CHAPTER 32

From Clicking to Coding: Using ODS Graphics Designer as a Tool to Learn Graph Template Language

If You Need These OBS and These VARS, Then Drop IF, and Keep WHERE Jay Iyengar, Data Systems Consultants LLC

SESUG Paper RIV An Obvious Yet Helpful Guide to Developing Recurring Reports in SAS. Rachel Straney, University of Central Florida

Making an RTF file Out of a Text File, With SAS Paper CC13

PharmaSUG China 2018 Paper AD-62

CMISS the SAS Function You May Have Been MISSING Mira Shapiro, Analytic Designers LLC, Bethesda, MD

SOUTHWEST DECISION SCIENCES INSTITUTE INSTRUCTIONS FOR PREPARING PROCEEDINGS

USING HASH TABLES FOR AE SEARCH STRATEGIES Vinodita Bongarala, Liz Thomas Seattle Genetics, Inc., Bothell, WA

IF there is a Better Way than IF-THEN

Presentation Quality Bulleted Lists Using ODS in SAS 9.2. Karl M. Kilgore, PhD, Cetus Group, LLC, Timonium, MD

Correcting for natural time lag bias in non-participants in pre-post intervention evaluation studies

SAS (Statistical Analysis Software/System)

Comparison of different ways using table lookups on huge tables

Transcription:

ABSTRACT PharmaSUG China 2017 - Paper 19 Streamline Table Lookup by Embedding HASH in FCMP Qing Liu, Eli Lilly & Company, Shanghai, China SAS provides many methods to perform a table lookup like Merge Statement, Proc SQL, Format. Starting with SAS 9.3, Even we can define our custom routine to perform table lookup since hashing is available in userdefined subroutines through the FCMP procedure. By embedding a hash object in PROC FCMP functions, simplicity in program can be enhanced. This paper introduces table lookup using HASH in PROC FCMP and demonstrate a practical application in using this technic to solve real world problems. INTRODUCTION Hashing is a search algorithm to index a table, thus to store and retrieve associated data by using index hash key. It has been introduced into SAS since SAS 9. Being considered the fastest way to search large amount information, Hash s main usage is to improve program performance when working with large dataset. While hashing is often used through DATA step hash object in SAS Program, its syntax is often considered complicated and hard to maintain. The FCMP procedure enables you to create user-defined SAS functions using DATA step syntax that is stored in a data set, this function can be called from the DATA step just as you would any other SAS function. Comparing with modern programming language, such as Python, R, SAS function has very limitation in general programming activity. However, starting with SAS 9.3, hashing is available in user-defined subroutines through the FCMP procedure. By embedding HASH in FCMP, we can extend the functionality of user-defined SAS functions and join the power of HASH and FCMP. This could streamline your existing programs, make your program more generic, and enhance simplicity. This paper is not intent to teach fundamental HASH or FCMP knowledge, but to introduce a new way to implement HASH and FCMP TABLE LOOKUP TRADITIONAL APPROACH VS HASHING FCMP There are many ways to look up table: merge, SQL join, Format, Data hashing, and etc. Below is an example comparing using traditional approach and hashing FCMP approach in table lookup. Figure 1. Sates and Capital Sample Data Traditional Approach 1. Data Step /* Data Step approach*/ proc sort data = StatesCapital; by state; proc sort data = States; by state; data new; merge StatesCapital States; by state; 1

2. PROC SQL /* Proc SQL */ proc sql; create table new as select a.state, b.capital from States as a, StatesCapital as b where a.state = b.state; 3. Data Hash /* Data Hash */ data new; if 0 then set work.statescapital; if _n_ = 1 then do; declare hash hh (dataset: "work.statescapital"); hh.definedata("capital"); hh.definekey("state"); hh.definedone(); set States; rc =hh.find(); if rc = 0 then capital = capital; Hash FCMP While many SAS users use Data Step or SQL to perform table lookup in their daily work. Advanced programmer may use hash in their program for efficiency purpose. However, using hash means more syntax, thus the program is much complex and hard to maintain. In this case, wrapping hash in SAS user-defined functions provide a more generic and simple way. And program could be much simplified. /* Define the hash function */ proc fcmp outlib=work.functions.samples; function hash_fcmp(state $) $25; declare hash hh(dataset: "work.statescapital"); rc=hh.definedata("capital"); rc=hh.definekey("state"); rc=hh.definedone(); rc=hh.find(); if rc eq 0 then return(capital); else return('not Found'); endsub; options cmplib=work.functions; /* Call the function */ * In Data Step; data new; set states; capital = hash_fcmp(state); * In Proc SQL; proc sql; create table new as select *,hash_fcmp(state) as capital from states; 2

As we can see in above example, embedding HASH in FCMP made table lookup more elegant. By packaging HASH in FCMP, the complicated HASH syntax goes no way. FCMP HASH OBJECT METHOD AND SYNTAX Method Syntax Description DECLARE DECLARE hash object-name Create a new instance of hash object, create at parse time. DEFINEKEY rc = object.definekey( key 1, key n ) Set up key variables for hash object DEFINEDATA rc = object.definedata( dataset 1, dataset n ) Define data to be stored in hash object DEFINEDONE rc = object.definedone() Indicate the key and data specification is completed DELETE rc = object.delete() Delete the hash object and free any resources allocated FIND rc = object.find( key1, keyn ) Search a hash object based on the values of defined key, If look up is successful, defined data variable are updated CHECK rc = object.check() Search a hash object based on the values of defined key, data will not be updated whether If look up is successful NUM_ITEMS rc = object.num_items() Return the number of items in hash object ADD rc = object.add(key: value1, key: value n) Add data with associated key to hash object REMOVE rc = object.remove(key: value1, key: value n) Remove data with associated key to hash object Table 2. HASH in FCMP Method and Syntax PRACTICAL APPLICATION In a Proportional font, such as the font this article is set in, different letters occupy different amount of horizontal space. For instance, the letter "I" is much narrower than the letter "W". Most books, magazines and other printed materials are set in proportional fonts. similarly, the graphical user interface of many programs uses a proportional font for titles, menus and other text. Examples of commonly-used proportional fonts are Times New Roman, Verdana, Arial, Georgia and Comic Sans. For my experience, there are no exist method in SAS to get a string widths directly. The method used in the following example demonstrate a way to calculate the display width of a proportional font string by joining the power of HASH and PROC FCMP. WHY SHOULD WE GET PRINT WIDTH OF A STRING The idea to calculate the display width of a string comes from two of my projects. The one is using SAS to draw annotation in rectangular box in CRF which is in PDF format in the development of a acrf automation tool, and we need to decide how long the rectangular box should be. The width of rectangular box varied from different string. Unfortunately, the string needs in proportional fonts and we can t get the rectangular box width just by counting characters in string. How Wide The Box Should Be? Figure 2. Determine the width of rectangular box Another is to avoid unwanted wrap up in RTF output since SAS let Microsoft Word to determine the print width. We need to get exact column width for each column so that text would not wrap up. This is especially important in some report which contains many columns and we cannot simply imagine the column width. No enough Space? Figure 3. Determine the width of RTF output 3

THE COMPONENTS OF CHARACTER WIDTH A: the width to add to the current position before placing the character B: the width of the character itself C: the white space to the right of the character The total width is determined by calculating the sum of A + B + C Figure 4. Characters Width SAS APPROACH TO GET WIDTH The process to get a print character width is listed below. 1. Determine the formula: String Width = Sum(Unique Character Width) 2. Got unique character width and stored in table 3. Retrieve using HASH and sum up using FCMP Get unique Character width Completed tables regarding print character width is not available in the website, thus I made a method approximating character width by using SAS. And it worked well. The method is listed below. 1. Output characters in RTF format in specific style, and make sure one character fill with a line(code could be found in the appendix) 2. Read RTF file back line by line, approximate character width using the following formula Length of the Line Width = Count of Chracter 3. Summarize The Hash Table by repeating above steps for different font style. Store the data in a dictionary table Figure 4. The character width dictionary 4

MAKE THE HASH FUNCTION * Customed SAS Function; proc fcmp outlib=width.functions.getwidth; function GetWidth(string $,_font_ $,_style_ $,_font_size_ $); length id $ 1; * Define Hash Table with Character Width Dataset; declare hash Calculate(dataset: "width.charwidth"); rc = Calculate.defineKey("char","_font_","_style_","_font_size_"); rc = Calculate.defineData("_width_"); rc = Calculate.defineDone(); width_tot = 0; * Retrieve Data from Hash and Sum up; do i = 1 to lengthn(string); char = substr(string,i,1); rc = Calculate.find(); if rc eq 0 then width_tot = width_tot+_width_; return(width_tot); endsub; SIMPLE USAGE *------------------- Hash in PROC FCMP Example Usage -------------------; * Using in Data Step; data Width; set TEXT; CharWidth = GetWidth(Value, 'Times','Normal','10'); * Using in Proc SQL; proc sql; create table Width as select GetWidth(Value, 'Times','Normal','10') as CharWidth from TEXT; * Using in Proc MEANS; proc means data = TEXT; where GetWidth(Value, 'Times','Normal','10') > 99; var Value; SIMPLE OUTPUT 5

CONCLUSION Table lookup by embedding Hash in PROC FCMP is not for efficiency, it s to streamline your daily work, enhance program simplicity and extend the functionality of user-defined SAS functions. And repetitive coding can be minimized. REFERENCES Andrew Henrick, Donald Erdman, and Stacey Christian (2013). Hashing in PROC FCMP to Enhance Your Productivity, Proceeding of the SAS Global Forum 2013 Conference. http://www.lexjansen.com/wuss/2013/141_paper.pdf Sample 47224: Load a SAS data set into a Hash Object using PROC FCMP http://support.sas.com/kb/47/224.html PROC FCMP and DATA Step Component Objects http://documentation.sas.com/?docsetid=proc&docsetversion=9.4&docsettarget=n03uc8c8fkguxqn1i5iapv1auq rz.htm&locale=en Kirk Paul Lafler (2016). A Hands-on Introduction to SAS DATA Step Hash Programming Techniques, Procceding of the MidWest SAS Users Group(MWSUG) Conference. http://www.lexjansen.com/mwsug/2016/hw/mwsug-2016-hw02.pdf CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Qing Liu Enterprise: Eli Lilly & Company Address: 19F Tower 1 HKRI, Taikoo Hui, No. 288 Shi Men Yi Road City, State ZIP: Shanghai, 200041 Work Phone: 13167017220 E-mail: MeetQingLiu@gmail.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 6

* Dummy all characters in ASCII; data null; length text $200; do i =160,32 to 127; text = repeat(byte(i),300); output; Source Code Sample * Define the output font style; proc template; define style global.text; parent=styles.rtf; replace fonts / 'fixedfont' 'docfont' ; = ("Times",10pt,Bold Normal) = ("Times",10pt,Bold Normal) * Out RTF file and read this file back; options orientation=portrait topmargin=1in bottommargin=1in leftmargin=1in rightmargin=1in nodate; title ''; ods rtf file='test.rtf' style=global.text; proc report data=null NOHEADER nowd; column text; define text/ display; ods rtf close; data width_1; infile "test.txt"; input; length text $200; retain blank; if _n_ = 1 then blank=count(_infile_,' '); if ^missing(_infile_); text = strip(_infile_); if _n_ ne 1; proc sql; create table width_2 as select distinct *, lengthn(text) as length from width_1 group by substr(text,1,1) having lengthn(text)=max(lengthn(text)); data width_3; set width_2; width_tot = 6.3; if _n_ =1 then do; 7

char = ' '; width = (width_tot/blank)*72; output; char = substr(text,1,1); width = (width_tot/length)*72; output; keep char width; proc sort;by width; data width; set width_3; length _font style font_size_ $20; _font_ = 'Times'; _style_ = 'Bold Normal'; _font_size_ = '10'; _width_ = width; keep char _font style font_size width_; data charwidth; set width; * Customed SAS Function; proc fcmp outlib=work.functions.getwidth; function GetWidth(string $,_font_ $,_style_ $,_font_size_ $); length id $ 1; * Retrieve Character Width Using HASH; declare hash Calculate(dataset: " work.charwidth"); rc = Calculate.defineKey("char","_font_","_style_","_font_size_"); rc = Calculate.defineData("_width_"); rc = Calculate.defineDone(); width_tot = 0; * Sum all width up for a string; do i = 1 to lengthn(string); char = substr(string,i,1); rc = Calculate.find(); if rc eq 0 then width_tot = width_tot+_width_; return(width_tot); endsub; options cmplib=work.functions; data a; set sashelp.cars; b=getwidth(strip(model),'times','normal','10'); 8