Let Hash SUMINC Count For You Joseph Hinson, Accenture Life Sciences, Berwyn, PA, USA

Similar documents
A Breeze through SAS options to Enter a Zero-filled row Kajal Tahiliani, ICON Clinical Research, Warrington, PA

PharmaSUG2014 Paper DS09

PharmaSUG Paper DS06 Designing and Tuning ADaM Datasets. Songhui ZHU, K&L Consulting Services, Fort Washington, PA

An Introduction to Visit Window Challenges and Solutions

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

Hands-On ADaM ADAE Development Sandra Minjoe, Accenture Life Sciences, Wayne, Pennsylvania Kim Minkalis, Accenture Life Sciences, Wayne, Pennsylvania

CC13 An Automatic Process to Compare Files. Simon Lin, Merck & Co., Inc., Rahway, NJ Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ

Customized Flowcharts Using SAS Annotation Abhinav Srivastva, PaxVax Inc., Redwood City, CA

Indenting with Style

A Practical and Efficient Approach in Generating AE (Adverse Events) Tables within a Clinical Study Environment

Deriving Rows in CDISC ADaM BDS Datasets

Are you Still Afraid of Using Arrays? Let s Explore their Advantages

PharmaSUG 2017 Paper BB02

Importing CSV Data to All Character Variables Arthur L. Carpenter California Occidental Consultants, Anchorage, AK

Hands-On ADaM ADAE Development Sandra Minjoe, Accenture Life Sciences, Wayne, Pennsylvania

Bad Date: How to find true love with Partial Dates! Namrata Pokhrel, Accenture Life Sciences, Berwyn, PA

Taming the Box Plot. Sanjiv Ramalingam, Octagon Research Solutions, Inc., Wayne, PA

PhUse Practical Uses of the DOW Loop in Pharmaceutical Programming Richard Read Allen, Peak Statistical Services, Evergreen, CO, USA

Applying ADaM Principles in Developing a Response Analysis Dataset

Using PROC SQL to Generate Shift Tables More Efficiently

The Basics of PROC FCMP. Dachao Liu Northwestern Universtiy Chicago

PharmaSUG Paper DS24

The Future of Transpose: How SAS Is Rebuilding Its Foundation by Making What Is Old New Again

Interleaving a Dataset with Itself: How and Why

CHAPTER 7 Using Other SAS Software Products

Amie Bissonett, inventiv Health Clinical, Minneapolis, MN

CMISS the SAS Function You May Have Been MISSING Mira Shapiro, Analytic Designers LLC, Bethesda, MD

Effectively Utilizing Loops and Arrays in the DATA Step

Essentials of PDV: Directing the Aim to Understanding the DATA Step! Arthur Xuejun Li, City of Hope National Medical Center, Duarte, CA

Not Just Merge - Complex Derivation Made Easy by Hash Object

KEYWORDS ARRAY statement, DO loop, temporary arrays, MERGE statement, Hash Objects, Big Data, Brute force Techniques, PROC PHREG

One-Step Change from Baseline Calculations

Reading and Writing RTF Documents as Data: Automatic Completion of CONSORT Flow Diagrams

Real Time Clinical Trial Oversight with SAS

USING HASH TABLES FOR AE SEARCH STRATEGIES Vinodita Bongarala, Liz Thomas Seattle Genetics, Inc., Bothell, WA

How to write ADaM specifications like a ninja.

Run your reports through that last loop to standardize the presentation attributes

Basic SAS Hash Programming Techniques Applied in Our Daily Work in Clinical Trials Data Analysis

CONSORT Diagrams with SG Procedures

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

Hash Objects Why Bother? Barb Crowther SAS Technical Training Specialist. Copyright 2008, SAS Institute Inc. All rights reserved.

A Side of Hash for You To Dig Into

Merging Data Eight Different Ways

PharmaSUG Paper DS-24. Family of PARAM***: PARAM, PARAMCD, PARAMN, PARCATy(N), PARAMTYP

Version 8 Base SAS Performance: How Does It Stack-Up? Robert Ray, SAS Institute Inc, Cary, NC

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

AVL 4 4 PDV DECLARE 7 _NEW_

Programmatic Automation of Categorizing and Listing Specific Clinical Terms

Regaining Some Control Over ODS RTF Pagination When Using Proc Report Gary E. Moore, Moore Computing Services, Inc., Little Rock, Arkansas

Programming Beyond the Basics. Find() the power of Hash - How, Why and When to use the SAS Hash Object John Blackwell

The new SAS 9.2 FCMP Procedure, what functions are in your future? John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc.

Practical Uses of the DOW Loop Richard Read Allen, Peak Statistical Services, Evergreen, CO

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

Automated Checking Of Multiple Files Kathyayini Tappeta, Percept Pharma Services, Bridgewater, NJ

Simplifying Your %DO Loop with CALL EXECUTE Arthur Li, City of Hope National Medical Center, Duarte, CA

SAS Online Training: Course contents: Agenda:

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

The Benefits of Traceability Beyond Just From SDTM to ADaM in CDISC Standards Maggie Ci Jiang, Teva Pharmaceuticals, Great Valley, PA

Working with Composite Endpoints: Constructing Analysis Data Pushpa Saranadasa, Merck & Co., Inc., Upper Gwynedd, PA

9 Ways to Join Two Datasets David Franklin, Independent Consultant, New Hampshire, USA

RWI not REI a Robust report writing tool for your toughest mountaineering challenges.

Contents of SAS Programming Techniques

Remember to always check your simple SAS function code! Yingqiu Yvette Liu, Merck & Co. Inc., North Wales, PA

Omitting Records with Invalid Default Values

Upholding Ethics and Integrity: A macro-based approach to detect plagiarism in programming

Unlock SAS Code Automation with the Power of Macros

Paper PS05_05 Using SAS to Process Repeated Measures Data Terry Fain, RAND Corporation Cyndie Gareleck, RAND Corporation

Countdown of the Top 10 Ways to Merge Data David Franklin, Independent Consultant, Litchfield, NH

Streamline Table Lookup by Embedding HASH in FCMP Qing Liu, Eli Lilly & Company, Shanghai, China

PharmaSUG Paper TT10 Creating a Customized Graph for Adverse Event Incidence and Duration Sanjiv Ramalingam, Octagon Research Solutions Inc.

Files Arriving at an Inconvenient Time? Let SAS Process Your Files with FILEEXIST While You Sleep

Using PROC REPORT to Cross-Tabulate Multiple Response Items Patrick Thornton, SRI International, Menlo Park, CA

DATA Step in SAS Viya : Essential New Features

PharmaSUG Paper CC11

PharmaSUG Paper PO10

Checking for Duplicates Wendi L. Wright

Interactive Programming Using Task in SAS Studio

USING SAS HASH OBJECTS TO CUT DOWN PROCESSING TIME Girish Narayandas, Optum, Eden Prairie, MN

One Project, Two Teams: The Unblind Leading the Blind

Reproducibly Random Values William Garner, Gilead Sciences, Inc., Foster City, CA Ting Bai, Gilead Sciences, Inc., Foster City, CA

Macro Quoting: Which Function Should We Use? Pengfei Guo, MSD R&D (China) Co., Ltd., Shanghai, China

PharmaSUG Paper TT11

Statistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment

Preparing the Office of Scientific Investigations (OSI) Requests for Submissions to FDA

Creating an ADaM Data Set for Correlation Analyses

HASH Beyond Lookups Your Code Will Never Be the Same!

Making the most of SAS Jobs in LSAF

This paper describes a report layout for reporting adverse events by study consumption pattern and explains its programming aspects.

Assessing superiority/futility in a clinical trial: from multiplicity to simplicity with SAS

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

A Taste of SDTM in Real Time

Creating output datasets using SQL (Structured Query Language) only Andrii Stakhniv, Experis Clinical, Ukraine

It s Proc Tabulate Jim, but not as we know it!

SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada

Use of Traceability Chains in Study Data and Metadata for Regulatory Electronic Submission

The Proc Transpose Cookbook

How to Incorporate Old SAS Data into a New DATA Step, or What is S-M-U?

PharmaSUG China Mina Chen, Roche (China) Holding Ltd.

Introduction to ADaM and What s new in ADaM

ABSTRACT INTRODUCTION WORK FLOW AND PROGRAM SETUP

Transcription:

ABSTRACT PharmaSUG 2014 - Paper CC02 Let Hash SUMINC Count For You Joseph Hinson, Accenture Life Sciences, Berwyn, PA, USA Counting of events is inevitable in clinical programming and is easily accomplished with SAS procedures such as FREQ, MEANS, SUMMARY, TABULATE, SQL or even by simple DATA step statements. In certain situations where counting is a bit intricate involving data partitioning and classifying, another convenient and efficient way is by hash programming. Within the DATA step, hash objects make data directly available to summary variables without the need to first save the data into a dataset. A little-known hash utility, SUMINC, was introduced with SAS version 9.2. SUMINC is an argument tag for the hash object declaration statement in which it designates a numeric variable for counting. Together with other new SAS 9.2 hash methods REF() and SUM(), counting of items becomes feasible in a key-based manner thus allowing any pattern of counting to be easily accomplished and directly available for summary reporting, as demonstrated in this paper for clinical trial protocol deviation analysis. SUMINC is also useful for NWAY summarization of data, but unlike the SUMMARY procedure or PROC SQL with GROUP BY, NWAY summarization with SUMINC can be combined with DATA step logic within the same DATA step. INTRODUCTION Clinical trial data analyses routinely involve the counting of events, and many SAS tools exist for accomplishing that, notably the FREQUENCY, MEANS, TABULATE, and SQL procedures, among others, and various DATA step approaches including BY processing. Often, one needs to capture the resulting count data from the SAS procedures either through ODS or by simply outputting into a dataset. SAS 9 hash objects are also able to compute counts, for instance with the hash attribute num_items. By grouping populations of interest into their own hash tables, the object.num_items functionality can directly provide the counts for each hash table. A new feature of hash objects introduced in SAS 9.2 was SUMINC 1, an argument tag in the hash declaration statement that assigns a numeric variable for counting. SUMINC instructs the hash object to allocate internal storage for maintaining a count value for each hash key. That count value is incremented by the value of the SUMINC counting variable whenever data is encountered with the matching key(s). The SUMINC-designated counting variable can be negative, positive, zerovalued, or even a non-integer. This paper will demonstrate the use of the SUMINC argument tag, in combination with two other new SAS 9.2 hash methods: REF() and SUM(), to determine the number of occurrences of protocol violations in a clinical study. FEATURES OF THE SUMINC METHOD 1. SUMINC The syntax is: declare hash object (SUMINC: counting_variable); where object is the name of the hash object, and counting_variable must be numeric. 2. REF() When hash objects were first introduced with SAS version 9, two available methods: CHECK() and ADD() could be used to populate a hash object with data. The CHECK() method determines if data exists for a particular hash key. If yes, a return code of 0 is produced, otherwise a non-zero code is issued. The ADD() method, as the name suggests, adds data to a hash table in a key-specific manner, whether the item already exists or not. SAS version 9.2 introduced an additional method, REF(), which combines CHECK() and ADD() into a single method. Thus, object.ref() first checks if the current key-value already exists in the hash table for object. If yes, nothing is done. But if no, the key-value pair is added to the hash table. 3. SUM() The SUM() method retrieves the value of the count associated with the current key and stores it in a totaling variable designated by a SUM argument tag in the SUM statement: rc=object.sum (SUM: totaling_variable). 1

CODING STRATEGY The entire count processing gets implemented in three stages: 1. SUMINC:counting_variable is specified in the hash declaration such that each time a key is found during data input, the value of counting_variable is added to an internal accumulator associated with that particular key. 2. The object.ref method does the key finding and the accumulator value updating. During data input, REF checks to see if the current key is already in the hash table. If not, the key is added and the internal accumulator is initialized to the value of the counting_variable. If the key was found to be already present, no key is added but the counting_variable value is added to the internal accumulator for that key. 3. Once all data have been read, the hash table is read out row-by-row by the hash iterator and the internal accumulator value for each key is captured by the SUM method. The object.sum method has an argument tag also called SUM, with a designated variable, totaling_variable, to capture the internal accumulator counts for each key: object.sum (SUM:totaling_variable). AN EXAMPLE WITH PROTOCOL DEVIATION ANALYSIS Consider a small clinical study situation in which there is the need to determine how many enrolled and treated subjects violated the protocol. Suppose that they were four possible violations: (a) (b) (c) (d) Chemistry test showing ABNORMAL LIVER ENZYMES. Chemistry test showing a POSITIVE PREGNANCY RESULT Immunology test showing ALLERGY TO STUDY DRUG Medication assessment indicating CONSUMPTION OF PROHIBITED MEDICATION Consider also that we need the occurrences of deviations in the following manner: 1. Counts per test parameter per treatment 2. Counts per test parameter 3. Counts of all subjects who deviated 4. Counts of all subjects who deviated per treatment 5. Counts for any deviation The input data has the subject id (USUBJID), the parameter category (PARCAT1), the parameter tested (PARAM), the treatment variable (TRTA), and the test outcome (AVALC). First, a dataset is created with a counting variable, COUNT=1, added to every observation. Next, a hash object for counting, ppt, is declared, having the suminc tag and the numeric counting_variable COUNT. declare hash ppt(suminc:"count"); In order to obtain counts per parameter per treatment, the hash keys are set up as below: rc=ppt.definekey("parcat1","param","trta");*<---[choice of keys determine type of counts]; rc=ppt.definedata("parcat1","param","trta"); rc=ppt.definedone(); declare hiter hippt("ppt"); 2

The input dataset, protdev, is then read in, while the hash method REF checks for keys in order to load the hash table: do until(done); set protdev end=done; rc=ppt.ref(); Finally, using a DO loop and the hash iterator object hippt, the hash table ppt is read out row-by-row while using the SUM method to extract the count totals per key into a variable totcount : rc=hippt.first(); do until (hippt.next() ne 0); ppt.sum(sum:totcount); --- --- --- In reality, the count totals are captured and sent to another hash table, all, for summarization. The all hash object acts as a data collector, storing all the various counts totals retrieved by the SUM method for all the five kinds of deviation analysis. This is one advantage afforded by using the hash technique, where intermediate data during analysis in a DATA step can directly be made available for summarization and manipulation without the need to create intermediate datasets. THE FULL CODE (tested in SAS 9.2): data protdev; infile datalines; input USUBJID $ TRTA $ PARCAT1 $12. PARAM & $23. AVALC $ ; count=1; *<--------[gives every observation a count of 1]; datalines; P001 Drug_A Medication Took Prohibited ConMed Yes P005 Drug_B Chemistry Abnormal Liver Enzymes Yes P006 Drug_B Immunology Allergic To Study Drug Yes P009 Drug_A Chemistry Positive Pregnancy Test Yes P011 Drug_B Chemistry Abnormal Liver Enzymes Yes P016 Drug_B Immunology Allergic To Study Drug Yes P026 Drug_B Immunology Allergic To Study Drug Yes P029 Drug_A Chemistry Positive Pregnancy Test Yes P031 Drug_A Medication Took Prohibited ConMed Yes P035 Drug_B Chemistry Abnormal Liver Enzymes Yes P045 Drug_B Chemistry Abnormal Liver Enzymes Yes P055 Drug_B Chemistry Abnormal Liver Enzymes Yes P056 Drug_B Immunology Allergic To Study Drug Yes P059 Drug_A Chemistry Positive Pregnancy Test Yes P066 Drug_B Immunology Allergic To Study Drug Yes P079 Drug_A Chemistry Positive Pregnancy Test Yes ; run; *--------------------------------------------------------- [1] CREATION OF HASH OBJECT FOR GROUP COUNTS ----------------------------------------------------------; data _null_; length totcount 8 totalcount $8; if(1=2) then set protdev;*<-------[a trick to just put all variables into PDV]; ****COUNTS PER PARAMETER PER TREATMENT*************; declare hash ppt(suminc:"count");*<[suminc counting variable MUST be numeric!]; rc=ppt.definekey("parcat1","param","trta");*<[choice of keys determine counts]; 3

rc=ppt.definedata("parcat1","param","trta"); rc=ppt.definedone(); declare hiter hippt("ppt"); ****COUNTS PER PARAMETER***************************; declare hash pp(suminc:"count"); rc=pp.definekey("parcat1","param"); rc=pp.definedata("parcat1","param","trta"); rc=pp.definedone(); declare hiter hipp("pp"); ****COUNTS OF ALL SUBJECTS WHO DEVIATED************; declare hash sb(suminc:"count"); rc=sb.definekey("count");*<-[variable COUNT used also to indicate any subject]; rc=sb.definedata("count","parcat1","param","trta"); rc=sb.definedone(); declare hiter hisb("sb"); ****COUNTS OF ALL SUBJECTS WHO DEVIATED PER TREATMENT*******; declare hash ts(suminc:"count"); rc=ts.definekey("trta","count");*<[variable COUNT used also for any subject]; rc=ts.definedata("count","parcat1","param","trta"); rc=ts.definedone(); declare hiter hits("ts"); ****FINAL SUMMARY TABLE**************************** -------------------------------------------------------------------------; length total drug_a drug_b $8; declare hash all(ordered:"y"); rc=all.definekey('parcat1','param'); rc=all.definedata('parcat1','param','total','drug_a','drug_b'); rc=all.definedone(); *--------------------------------------------------- [2] INITIATE COUNTING ----------------------------------------------------; do until(done); set protdev end=done; *************FOR COUNT PROCESSING************************; rc=ppt.ref();*<------(count per parameter per treatment); rc=pp.ref();*<------(count per parameter); rc=sb.ref();*<------(count per all deviators); rc=ts.ref();*<------(count per all deviators per treatment); *--------------------------------------------------------------------- [3] HASH READOUT OF COUNT TOTALS USING THE SUM FUNCTION (The data-collection hash object ALL always has keys: parcat1, param) ----------------------------------------------------------------------; *********counts per param per treatment******************; array cx{6} $23 _temporary_; rc=hippt.first(); do until (hippt.next() ne 0); ppt.sum(sum:totcount); x=3+whichc(trta,'total','drug_a','drug_b');*<-[do some data transposition]; 4

*********counts per parameter only************************; rc=hipp.first(); do until (hipp.next() ne 0); TRTA="TOTAL"; pp.sum(sum:totcount); x=3+whichc(trta,'total','drug_a','drug_b');*<-[does some data transposition]; *********counts per subjects only*************************; rc=hisb.first(); do until (hisb.next() ne 0); TRTA="TOTAL"; PARCAT1="ANY"; PARAM="DEVIATION"; sb.sum(sum:totcount); x=3+whichc(trta,'total','drug_a','drug_b');*<--[does some data transposition]; ********counts per treatments only************************; rc=hits.first(); do until (hits.next() ne 0); PARCAT1="ANY"; PARAM="DEVIATION"; ts.sum(sum:totcount); x=3+whichc(trta,'total','drug_a','drug_b');*<--[performs some transposition]; 5

***********overall output*********************************; all.output(dataset:'alltable'); stop; run; *********convert missing values to zeroes**************; data finaltable; set alltable; drug_a=ifc(missing(drug_a),'0',drug_a,'0'); drug_b=ifc(missing(drug_b),'0',drug_b,'0'); run; Output 1 Table 1. Hash table ALLTABLE (from hash object all ) showing protocol deviation counts Output 2 Table 2. Hash table ( FINALTABLE ) showing protocol deviation counts with blanks replaced by zeroes CONCLUSION The novel counting approach demonstrated in this paper involves the hash SUMINC argument tag, the hash REF method, and the hash SUM method. SUMINC defines the counting variable, REF triggers the accumulation of counts, and SUM retrieves the accumulated counts. All the steps are hash-key based, similar to the TABLES statement of the FREQUENCY procedure. Even though the FREQUENCY procedure and the count functionality of the SQL procedure in most cases are quite adequate for counting clinical events, the SUMINC approach provides much more flexibility and convenient data gathering steps. The power of the approach is even more evident when NWAY summarization is required. REFERENCES 1. R. Ray and J. Secosky, Better Hashing in SAS 9.2, SAS Global Forum 2008. http://support.sas.com/resources/papers/sgf2008/hashing92.pdf 6

CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Joseph W. Hinson, PhD Accenture Life Sciences 1160 W. Swedesford Rd Berwyn, PA 19312 1-610-407-7580 joehinson@outlook.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 7