Basic SAS Hash Programming Techniques Applied in Our Daily Work in Clinical Trials Data Analysis

Similar documents
Not Just Merge - Complex Derivation Made Easy by Hash Object

Programming Beyond the Basics. Find() the power of Hash - How, Why and When to use the SAS Hash Object John Blackwell

Countdown of the Top 10 Ways to Merge Data David Franklin, Independent Consultant, Litchfield, NH

Merging Data Eight Different Ways

9 Ways to Join Two Datasets David Franklin, Independent Consultant, New Hampshire, USA

PharmaSUG Paper PO10

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

Journey to the center of the earth Deep understanding of SAS language processing mechanism Di Chen, SAS Beijing R&D, Beijing, China

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

Remember to always check your simple SAS function code! Yingqiu Yvette Liu, Merck & Co. Inc., North Wales, PA

Reproducibly Random Values William Garner, Gilead Sciences, Inc., Foster City, CA Ting Bai, Gilead Sciences, Inc., Foster City, CA

HASH Beyond Lookups Your Code Will Never Be the Same!

PharmaSUG China Paper 70

USING SAS HASH OBJECTS TO CUT DOWN PROCESSING TIME Girish Narayandas, Optum, Eden Prairie, MN

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

PharmaSUG China

Exploring HASH Tables vs. SORT/DATA Step vs. PROC SQL

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

The Power of Combining Data with the PROC SQL

CDISC Variable Mapping and Control Terminology Implementation Made Easy

Paper HOW-06. Tricia Aanderud, And Data Inc, Raleigh, NC

ABSTRACT MORE THAN SYNTAX ORGANIZE YOUR WORK THE SAS ENTERPRISE GUIDE PROJECT. Paper 50-30

Paper William E Benjamin Jr, Owl Computer Consultancy, LLC

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

100 THE NUANCES OF COMBINING MULTIPLE HOSPITAL DATA

Paper Haven't I Seen You Before? An Application of DATA Step HASH for Efficient Complex Event Associations. John Schmitz, Luminare Data LLC

Macro Quoting: Which Function Should We Use? Pengfei Guo, MSD R&D (China) Co., Ltd., Shanghai, China

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

A Tool to Compare Different Data Transfers Jun Wang, FMD K&L, Inc., Nanjing, China

Let Hash SUMINC Count For You Joseph Hinson, Accenture Life Sciences, Berwyn, PA, USA

PharmaSUG Paper TT11

SESUG 2014 IT-82 SAS-Enterprise Guide for Institutional Research and Other Data Scientists Claudia W. McCann, East Carolina University.

From Clicking to Coding: Using ODS Graphics Designer as a Tool to Learn Graph Template Language

PharmaSUG China Paper 059

PharmaSUG China Big Insights in Small Data with RStudio Shiny Mina Chen, Roche Product Development in Asia Pacific, Shanghai, China

USING HASH TABLES FOR AE SEARCH STRATEGIES Vinodita Bongarala, Liz Thomas Seattle Genetics, Inc., Bothell, WA

DATA Step in SAS Viya : Essential New Features

An Alternate Way to Create the Standard SDTM Domains

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

Overview of HASH Objects Swarnalatha Gaddam, Cytel Inc. Hyderabad, India

PhUse Practical Uses of the DOW Loop in Pharmaceutical Programming Richard Read Allen, Peak Statistical Services, Evergreen, CO, USA

A Breeze through SAS options to Enter a Zero-filled row Kajal Tahiliani, ICON Clinical Research, Warrington, PA

Hash Objects Why Bother? Barb Crowther SAS Technical Training Specialist. Copyright 2008, SAS Institute Inc. All rights reserved.

Interleaving a Dataset with Itself: How and Why

ABSTRACT. Paper

Cleaning up your SAS log: Note Messages

PharmaSUG China. Systematically Reordering Axis Major Tick Values in SAS Graph Brian Shen, PPDI, ShangHai

Easy CSR In-Text Table Automation, Oh My

Getting it Done with PROC TABULATE

Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC

SAS BI Dashboard 3.1. User s Guide Second Edition

1. Join with PROC SQL a left join that will retain target records having no lookup match. 2. Data Step Merge of the target and lookup files.

Real Time Clinical Trial Oversight with SAS

Anatomy of a Merge Gone Wrong James Lew, Compu-Stat Consulting, Scarborough, ON, Canada Joshua Horstman, Nested Loop Consulting, Indianapolis, IN, USA

Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA

Application Interface for executing a batch of SAS Programs and Checking Logs Sneha Sarmukadam, inventiv Health Clinical, Pune, India

Quick Data Definitions Using SQL, REPORT and PRINT Procedures Bradford J. Danner, PharmaNet/i3, Tennessee

PharmaSUG China Mina Chen, Roche (China) Holding Ltd.

Leverage custom geographical polygons in SAS Visual Analytics

Practical Uses of the DOW Loop Richard Read Allen, Peak Statistical Services, Evergreen, CO

Essentials of PDV: Directing the Aim to Understanding the DATA Step! Arthur Xuejun Li, City of Hope National Medical Center, Duarte, CA

Rational Numbers and the Coordinate Plane

Making the most of SAS Jobs in LSAF

The Proc Transpose Cookbook

Quality Control of Clinical Data Listings with Proc Compare

Copyright 2012 Pulse Systems, Inc. Page 1 of 46

Paper CT-16 Manage Hierarchical or Associated Data with the RETAIN Statement Alan R. Mann, Independent Consultant, Harpers Ferry, WV

This paper describes a report layout for reporting adverse events by study consumption pattern and explains its programming aspects.

Data Explore Matrix Quality Control by Exploring and Mining Data in Clinical Study

A SAS Macro for Generating Informative Cumulative/Point-wise Bar Charts

While You Were Sleeping - Scheduling SAS Jobs to Run Automatically Faron Kincheloe, Baylor University, Waco, TX

A SAS Macro to Generate Caterpillar Plots. Guochen Song, i3 Statprobe, Cary, NC

Submitting Code in the Background Using SAS Studio

Online Demo Guide. Barracuda PST Enterprise. Introduction (Start of Demo) Logging into the PST Enterprise

PharmaSUG Paper SP04

Please don't Merge without By!!

Why SAS Programmers Should Learn Python Too

Effectively Utilizing Loops and Arrays in the DATA Step

Batch vs. Interactive: Why You Need Both Janet E. Stuelpner. ASG. Inc Cary. North Carolina

Run your reports through that last loop to standardize the presentation attributes

Paper ###-YYYY. SAS Enterprise Guide: A Revolutionary Tool! Jennifer First, Systems Seminar Consultants, Madison, WI

PhUSE US Connect 2019

The Basics of PROC FCMP. Dachao Liu Northwestern Universtiy Chicago

A Guided Tour Through the SAS Windowing Environment Casey Cantrell, Clarion Consulting, Los Angeles, CA

Are Your SAS Programs Running You? Marje Fecht, Prowerk Consulting, Cape Coral, FL Larry Stewart, SAS Institute Inc., Cary, NC

Working with Composite Endpoints: Constructing Analysis Data Pushpa Saranadasa, Merck & Co., Inc., Upper Gwynedd, PA

Standardizing FDA Data to Improve Success in Pediatric Drug Development

TLF Management Tools: SAS programs to help in managing large number of TLFs. Eduard Joseph Siquioco, PPD, Manila, Philippines

The Benefits of Traceability Beyond Just From SDTM to ADaM in CDISC Standards Maggie Ci Jiang, Teva Pharmaceuticals, Great Valley, PA

An Introduction to SAS Hash Programming Techniques Kirk Paul Lafler, Software Intelligence Corporation, Spring Valley, California

PharmaSUG Paper DS06 Designing and Tuning ADaM Datasets. Songhui ZHU, K&L Consulting Services, Fort Washington, PA

ODS TAGSETS - a Powerful Reporting Method

Using Data Set Options in PROC SQL Kenneth W. Borowiak Howard M. Proskin & Associates, Inc., Rochester, NY

Hash-Based Indexing 1

Simplifying Your %DO Loop with CALL EXECUTE Arthur Li, City of Hope National Medical Center, Duarte, CA

Submission-Ready Define.xml Files Using SAS Clinical Data Integration Melissa R. Martinez, SAS Institute, Cary, NC USA

An Efficient Tool for Clinical Data Check

The Path To Treatment Pathways Tracee Vinson-Sorrentino, IMS Health, Plymouth Meeting, PA

Parallelizing Windows Operating System Services Job Flows

Transcription:

PharmaSUG China 2018 Paper 18 Basic SAS Hash Programming Techniques Applied in Our Daily Work in Clinical Trials Data Analysis ABSTRACT Fanyu Li, MSD, Beijing, China With the development of SAS programming techniques, more and more methods could be used in the SAS. As a statistical programmer, we always need to merge some information from one dataset to another dataset. The simple cases are one-to-one merge and one-to-many merge. However, sometime, we may face the many-to-many merge, what should we do? We could utilize a hash object to perform lookups within the DATA step with easier method. The hash object is a powerful technique and this technique could improve the efficiency of table lookup. In this paper, it presents some practical examples which the programmers could encounter in their daily work. In addition, it also brings up some points that need to pay attention to when using the Hash programming techniques. INTRODUCTION The Hash programming technique provides an efficient, convenient mechanism for quick data storage and retrieval. It stores and retrieves data based on lookup keys. There are three steps to create a hash object. 1. Declare the hash object. 2. Create an instance of (instantiate) the hash object. 3. Initialize lookup keys and data. Data step is the basic step of SAS programming. However, when dealing with large volumes of data, the ordinary data steps can be slow. A hash object is a type of array that a program accesses using keys. A hash object consists of key items and data items. The programming language applies a hash function that maps the keys to positions in the array. There are a lot of papers that had introduction about how to create a hash object. This paper will not present the details again. Below it presents some examples to show how to use it in clinical trials. EXAMPLE 1 The following example used the SASHELP dataset to show how to implement the hash iterator object. Using the hash iterator object could store and search data based on lookup keys. The hash iterator object enables you to retrieve the hash object data in either forwarding or reversing key order. There is dataset called CARS in SASHELP. In this dataset, it has a lot of information such as the brand of cars, the model of cars, what is the type of cars and so on. Let s see the snapshot of this dataset. 1

Assume that one customer needs to find records with horsepower greater than 350. We used the following statements. After running the above code, there are six such records and the outputs are ordered in ascending sequence. The log is clear. However, after using the simplest way to test, this result is not correct! Why is this result not correct? How to correct the above code? After modifying the above code, the correct output below shows all the records with horsepower great than 350. 2

Let s see the difference of the two outputs. There are 5 records didn t appear in our previous output. The first output only shows one record for Jaguar and Mercedes-Benz. The hash object process defines the horsepower as the hash key however it is not unique, that s why there is only one record for Jaguar and Mercedes-Benz. After adding the option Multidata: Y, all the records with horsepower greater than 350 were selected. This example is very simple. You need pay attention to whether the key is unique, please remember to add the Multidata: Y option. If the customer wants to find the records with horsepower>350 and the cheapest retail price, could we only revised our previous program and get the result? The answer is yes and the modified part is very small. After rerun the above code, the result is shown as following. There are six types of horsepower which is greater than 350. If the horsepower is 390, the cheapest price of the brand is Jaguar with the retail price $63,120. If the horsepower is 493, we could find 3 records in for Mercedes-Benz, the cheapest one is $121,770. If the key is not unique and there are several records, we could only list the first or last record. The option duplicate determines whether to ignore duplicate keys when loading a data set into the hash object. The default is to store the first key and ignore all subsequent duplicates. If we want to store the last duplicate key record, we should use replace as the option s value. If we want to check whether the key is unique or not unique, we could use error as the option s value and it reports an error to the log if key is not unique. EXAMPLE 2 In this example, it presents how the hash programming technique is applied in the EPOCH derivation. In SE domain, we have already had the 3

EPOCH. Suppose that we need to derive the EPOCH in VS domain. Let s use SE domain dataset first. We just need consider the date when deriving EPOCH. In SE domain, the above snapshot just keep four variables: SUBJID, EPOCH, SESTDTC, SEENDTC. In VS domain, we just keep 4 variables: SUBJID, VSTEST, VSDTC, VSSTRESN. The below snapshot is just part of the records in VS domain. 4

If VSDTC is between in SESTDTC and SEENDTC within the same subject, the EPOCH in SE domain should be the EPOCH in VS domain. Let s see how to implement it. After running the above code, the log is clear. The result presents as belowed. 5

Obviously, the result is not correct. We could find that the highlighted records (subject 110109) merged SESTDTC and SEENDTC records did not belong to subject 110109 in SE domain. It merged with other patient s SESTDTC and SEENDTC. If the VSDTC is between in the SESTDTC and SEENDTC, the qualified records were created. The hash iterator passes through all the SE domain dataset ignoring the key. The data step hash iterator is a companion for the DATA step hash object. It enables programs to step through hash object records without performing key lookups. If the record s VSDTC falls between SESTDTC and SEENDTC, it is considered a 'match' and output to VS_1 dataset. How to revise it so that it could get correct result? Let s see below revised code. We chose to use the Hash Object programming, not hash iterator. After running the revised code, the correct result. Let s see the above results. In SE domain, Subject 110109 had three records, the screening epoch started from 1940-10-15 to 2018-04-12. The treatment epoch started from 2018-04-12 to 2018-05-22. The period after 2018-05-22 was the follow-up epoch. In this example, if the date is the same with the end date of next epoch s start date, the next epoch should be given for this record epoch. If all records in VS had EPOCH, VS_2 records should be the same with VS. If some records did not have EPOCH, the programmer should re-check why the EPOCH was missing for those records. 6

CONCLUSION SAS hash programming is a powerful and efficient approach for table lookups and many to many merges. When you face the many to many merge, you could try this method instead of Cartesian. Although it maybe seems a litter bit difficult at first, you could write a simpler code to complete this process. Some examples are my lessons when I learn to use this method. After you get familiar with basic statement and meet the issues, you should look up the details and apply the different options for this process. REFERENCES SAS 9.4 Language Reference: Concepts, Sixth Edition. Using the Hash Object. Getting Started with the DATA Step Hash Iterator. Janice Bloom, SAS Institute Inc., Cary, NC and Jason Secosky, SAS Institute Inc., Cary, NC SCSUG 2012, SAS HASH Programming basics, Daniel Sakya, PPD Inc., Austin, TX ACKNOWLEDGMENTS I would like to thank Peng Wan to encourage and support to participant this conference and present this paper. In addition, I also should thank Yong Cao to give me some instruction and help to complete this paper. CONTACT INFORMATION <HEADING 1> Your comments and questions are valued and encouraged. Contact the author at: Name: Fanyu Li Enterprise: MSD Address: 8F, Building 21 Rongda Road, Wangjing R&D Base, Zhongguancun Electronic Zone West Zone, Chaoyang District. City, State ZIP: Beijing, 100012 Work Phone: 58609282 E-mail: Fanyu.li@merck.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 7