USING HASH TABLES FOR AE SEARCH STRATEGIES Vinodita Bongarala, Liz Thomas Seattle Genetics, Inc., Bothell, WA

Similar documents
MedDRA Developer Webinar

Hands-On ADaM ADAE Development Sandra Minjoe, Accenture Life Sciences, Wayne, Pennsylvania

Hands-On ADaM ADAE Development Sandra Minjoe, Accenture Life Sciences, Wayne, Pennsylvania Kim Minkalis, Accenture Life Sciences, Wayne, Pennsylvania

MedDRA Dictionary: Reporting Version Updates Using SAS and Excel

A Macro for Generating the Adverse Events Summary for ClinicalTrials.gov

Not Just Merge - Complex Derivation Made Easy by Hash Object

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

Indenting with Style

MedDRA BEST PRACTICES. Maintenance and Support Services Organization s (MSSO) Recommendations for Implementation and Use of MedDRA

Analysis Data Model (ADaM) Data Structure for Adverse Event Analysis

The Benefits of Traceability Beyond Just From SDTM to ADaM in CDISC Standards Maggie Ci Jiang, Teva Pharmaceuticals, Great Valley, PA

JMP Clinical. Release Notes. Version 5.0

A Practical and Efficient Approach in Generating AE (Adverse Events) Tables within a Clinical Study Environment

Are you Still Afraid of Using Arrays? Let s Explore their Advantages

Integrated Clinical Systems, Inc. announces JReview 13.1 with new AE Incidence

Creating output datasets using SQL (Structured Query Language) only Andrii Stakhniv, Experis Clinical, Ukraine

MedDRA Update. MedDRA Industry User Group Meeting. 28 September 2018

Detecting Treatment Emergent Adverse Events (TEAEs)

Developing Graphical Standards: A Collaborative, Cross-Functional Approach Mayur Uttarwar, Seattle Genetics, Inc., Bothell, WA

Change Request Information

Standard Safety Visualization Set-up Using Spotfire

Tools to Facilitate the Creation of Pooled Clinical Trials Databases

This paper describes a report layout for reporting adverse events by study consumption pattern and explains its programming aspects.

Defining the Extent of MedDRA Versioning Updates: MSSO Recommendation

Hash Objects for Everyone

Programmatic Automation of Categorizing and Listing Specific Clinical Terms

Advanced Visualization using TIBCO Spotfire and SAS

SDTM Implementation Guide Clear as Mud: Strategies for Developing Consistent Company Standards

ABSTRACT INTRODUCTION WHERE TO START? 1. DATA CHECK FOR CONSISTENCIES

Matt Downs and Heidi Christ-Schmidt Statistics Collaborative, Inc., Washington, D.C.

Let Hash SUMINC Count For You Joseph Hinson, Accenture Life Sciences, Berwyn, PA, USA

PharmaSUG Paper PO12

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

Knock Knock!!! Who s There??? Challenges faced while pooling data and studies for FDA submission Amit Baid, CLINPROBE LLC, Acworth, GA USA

Get SAS sy with PROC SQL Amie Bissonett, Pharmanet/i3, Minneapolis, MN

PharmaSUG China Mina Chen, Roche (China) Holding Ltd.

Standardizing FDA Data to Improve Success in Pediatric Drug Development

Pooling Clinical Data: Key points and Pitfalls. October 16, 2012 Phuse 2012 conference, Budapest Florence Buchheit

Overview of HASH Objects Swarnalatha Gaddam, Cytel Inc. Hyderabad, India

Basic SAS Hash Programming Techniques Applied in Our Daily Work in Clinical Trials Data Analysis

Hash Objects Why Bother? Barb Crowther SAS Technical Training Specialist. Copyright 2008, SAS Institute Inc. All rights reserved.

Extending SMQ Hierarchies in TMS Defining Filter Dictionary Structures to Allow Maximum, Flexible Design for Customized SMQs

Recursive Programming Applications in Base SAS

Getting the Most from Hash Objects. Bharath Gowda

Clinical Data Visualization using TIBCO Spotfire and SAS

SAS CLINICAL SYLLABUS. DURATION: - 60 Hours

PharmaSUG Paper PO21

Paper Haven't I Seen You Before? An Application of DATA Step HASH for Efficient Complex Event Associations. John Schmitz, Luminare Data LLC

Real Time Clinical Trial Oversight with SAS

Programming Beyond the Basics. Find() the power of Hash - How, Why and When to use the SAS Hash Object John Blackwell

PharmaSUG Paper PO10

An Efficient Tool for Clinical Data Check

Parallelizing Windows Operating System Services Job Flows

MedDRA 12.1 Review. Terminology Files: What you need to know. MedDRA Version 12.1 Terminology Frequency Distribution

Introduction to ADaM and What s new in ADaM

MedDRA 13.1 Review. Terminology Files: What you need to know. MedDRA Version 13.1 Terminology Frequency Distribution

Getting Started with MedDRA

One-PROC-Away: The Essence of an Analysis Database Russell W. Helms, Ph.D. Rho, Inc.

Version 8 Base SAS Performance: How Does It Stack-Up? Robert Ray, SAS Institute Inc, Cary, NC

USING SAS HASH OBJECTS TO CUT DOWN PROCESSING TIME Girish Narayandas, Optum, Eden Prairie, MN

Automate Clinical Trial Data Issue Checking and Tracking

Pooling strategy of clinical data

Report Writing, SAS/GRAPH Creation, and Output Verification using SAS/ASSIST Matthew J. Becker, ST TPROBE, inc., Ann Arbor, MI

NCI/CDISC or User Specified CT

HOW TO EFFECTIVELY DEAL WITH HARDCODING AND CDISC CONTROLLED TERMINOLOGY IN CLINICAL STUDIES

MedDRA MSSO User Group Meeting. Glasgow, UK 28 March 2017

Harmonizing CDISC Data Standards across Companies: A Practical Overview with Examples

How a Metadata Repository enables dynamism and automation in SDTM-like dataset generation

CDISC SDTM and ADaM Real World Issues

The Power of Combining Data with the PROC SQL

How to write ADaM specifications like a ninja.

Implementing CDISC Using SAS. Full book available for purchase here.

An Introduction to Visit Window Challenges and Solutions

A Breeze through SAS options to Enter a Zero-filled row Kajal Tahiliani, ICON Clinical Research, Warrington, PA

Countdown of the Top 10 Ways to Merge Data David Franklin, Independent Consultant, Litchfield, NH

Automatic Indicators for Dummies: A macro for generating dummy indicators from category type variables

Best Practice for Creation and Maintenance of a SAS Infrastructure

Cleaning up your SAS log: Note Messages

Guide Users along Information Pathways and Surf through the Data

An Introduction to Analysis (and Repository) Databases (ARDs)

From Implementing CDISC Using SAS. Full book available for purchase here. About This Book... xi About The Authors... xvii Acknowledgments...

9 Ways to Join Two Datasets David Franklin, Independent Consultant, New Hampshire, USA

One Project, Two Teams: The Unblind Leading the Blind

Getting Started with MedDRA

Comparison of different ways using table lookups on huge tables

Cross-Relating. WHODrug ATC Codes and MedDRA SOCs. Sunil G. Singh. DBMS Consulting 19 September 2006 Thesaurus Management System Session 22

Oracle Argus Safety. BIP Aggregate Reporting User s Guide Release E

The first approach to accessing pathology diagnostic information is the use of synoptic worksheets produced by the

What Is SAS? CHAPTER 1 Essential Concepts of Base SAS Software

Common Programming Errors in CDISC Data

Customized Flowcharts Using SAS Annotation Abhinav Srivastva, PaxVax Inc., Redwood City, CA

SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada

IF there is a Better Way than IF-THEN

Merging Data Eight Different Ways

Programming Gems that are worth learning SQL for! Pamela L. Reading, Rho, Inc., Chapel Hill, NC

%check_codelist: A SAS macro to check SDTM domains against controlled terminology

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

Data Manipulation with SQL Mara Werner, HHS/OIG, Chicago, IL

Standardising The Standards The Benefits of Consistency

OUT= IS IN: VISUALIZING PROC COMPARE RESULTS IN A DATASET

Transcription:

harmasug 2017 - Paper BB08 USING HASH TABLES FOR AE SEARCH STRATEGIES Vinodita Bongarala, Liz Thomas Seattle Genetics, Inc., Bothell, WA ABSTRACT As part of adverse event safety analysis, adverse events of special interest (AESI) are identified within a study by a variety of means. Most commonly, AESIs can be identified with standardized MedDRA Queries (SMQs), MedDRA System Organ Classes (SOCs), or a customized collection of MedDRA preferred terms (PTs) or lower level terms (LLTs). Analysis of AESIs is similar, regardless of the search strategy used to identify them. Identifying AESIs may involve merging the AE dataset with multiple datasets on keys of different levels (LLT, PT or SOC). Using a hash table as opposed to multiple sorted or unsorted merges can simplify both the code and execution time. Faster processing with SAS hash tables has considerable utility in large safety databases. The efficiency gain of using hash tables over other methods of merging (both sorted and unsorted) has been well characterized. This paper demonstrates how to use hash tables to identify AESIs and two methods of structuring the resulting data for rapid subsequent tabulation and analysis, with benchmark comparison to using unstructured SQL merges to obtain the same results. INTRODUCTION Adverse Events analysis is most critical in determining the safety profile of any drug. It is therefore monitored frequently during every phase of a clinical trial both internally by the study team and pharmacovigiliance committees and by external bodies like data safety monitoring boards (DSMBs), Safety Monitoring Committees (SMCs) or regulatory authorities. An adverse event is indicative of some kind of negative impact on the patient s health and well-being. Some adverse events are unavoidable and they reflect the risk associated with the treatment. Due to the level of detail present in the adverse event data, the presence of a medical condition may not be easily visualized when only summarizing Preferred Terms (PTs) or System Organ Classes (SOCs). MedDRA (Medical Dictionary of Regulatory Affairs) developed Standardized MedDRA Queries (SMQs) just to resolve this issue. SMQs provide a validated tool to identify all PTs considered to be related to a single condition or area of interest. The terms included relate to signs, symptoms, diagnoses, syndromes, physical findings, laboratory and other physiologic test data, etc. that are associated with the medical condition or area of interest. SMQs can be nested, and may include an algorithmic determination of a condition requiring comorbidity of multiple PTs. NATURE OF ADVERSE EVENTS DATA The complexity of adverse event analysis can be attributed to the fact that AE data is generally considered as hierarchical in nature. SOCs are a broad classification of the area of the disease, and the classification hierarchy is increasingly refined down to the lower level terms (LLTs), as displayed in the chart below.

Figure 1: MedDRA Hierarchy The hierarchy of the adverse events is not quite as linear as the chart above indicates. The classification is unique from LLT up to HLGT. However, the relationship between SOCs and the rest of the hierarchy is not a strict single classification. It is possible that a single PT can map to multiple SOCs. MedDRA terms this phenomenon as Multi-axiality. One of the SOCs is assigned as the primary SOC, and should be used for overall AE summaries. However, when using SOC as a search strategy, all PTs that map to an SOC should be considered. In the below figure, the Influenza PT is classified under two SOCs, but Infections and Infestations is assigned as the primary SOC. Figure 2: Multi-Axiality of the Influenza PT HASH OBJECTS Hash objects have been available for use in a DATA step since SAS 9. They provide a nifty means of quick retrieval and storage of data. They reside in the memory and hence are quickly retrievable with reference to the lookup keys like array elements. Hash objects do not require the data to be sorted or

indexed. These can be interpreted as lookups that provide far faster results compared unsorted joins with proc SQL. This paper explores how hash objects can be efficiently used in the form of hash tables for implementing AE search strategies. The structure of a hash object can be divided into two components: key component and data component. The key component maps the key values to data rows and it can be numeric or character in nature. The data component has the ability to store multiple data values per key value and it can consist of numeric or character values. The values of a hash table can be loaded from a SAS dataset or hard-coded values and these exist only for the duration of the DATA step. USING HASH TABLES TO IMPLEMENT AE SEARCH STRATEGY Suppose a search strategy encompasses three levels: SMQs, SOCs and PTs. Since SMQs may be broken down by LLT, the only previous way to do this was with three separate joins: one on LLT level one on SOC level one on PT level The same results can be obtained using the hash tables instead. Instantiating and referencing three hash tables in a single data step, we can pull out the LLT, SOC and PT matches with one pass through the data. In order to accomplish this assume three datasets have been created: SMQ dataset (with LLT codes and search name) SOC dataset (with SOCs and search name) PT dataset (with PTs and search name) Search name refers to the intent of a search, such as the AESI name or a description of the unifying search strategy (e.g. hepatotoxicity). Using a search name accommodates doing multiple searches in a single pass, creating outputs for multiple AESIs. Example code:

data search_strategy(drop=rc llt_code aellt aeseq recordposition); if _n_=1 then do; length search_name $100 llt_code $160 scope $8 soc_name $480 pt_name $480; declare hash smq (dataset: 'work.smq', multidata: 'y'); rc=smq.definekey('llt_code'); rc=smq.definedata('search_name', 'scope'); rc=smq.definedone(); declare hash sch (dataset: 'work.ssoc', multidata: 'y'); rc=sch.definekey('soc_name'); rc=sch.definedata('search_name'); rc=sch.definedone(); declare hash pth (dataset: 'work.spt', multidata: 'y'); rc=pth.definekey('pt_name'); rc=pth.definedata('search_name'); rc=pth.definedone(); call missing(search_name, scope); set ae(keep=subject aeseq aeterm startd aestdtc_raw strttime aeser related aesev aerel aerelbn aeout aeendtc_raw stttime llt_code pt_name soc_name recordposition); rc=smq.find(key:llt_code); rc=smq.find_next(key:llt_code); rc=sch.find(key:soc_name); rc=sch.find_next(key:soc_name); rc=pth.find(key:pt_name); rc=pth.find_next(key:pt_name); run; Section 1: defining the hash table Use a length statement to initialize variables that will be joined to the table: length search_name $100 llt_code $160 scope $8 soc_name $480 pt_name $480; Declare the hash with a name (smq), a source dataset (dataset), and indicate if it s possible for a single key to match multiple values (multidata): declare hash smq (dataset: 'work.smq', multidata: 'y'); Use a dummy variable (rc) to try the assignment of the key (which should match with your dataset), the data to join, and end the define block: rc=smq.definekey('llt_code'); rc=smq.definedata('search_name', 'scope'); rc=smq.definedone();

After declaring and creating the hash object, in order to avoid an uninitialized note from SAS, be sure to use the call missing routine to assign missing values to the variables. Section 2: Using the Hash Use a dummy variable to try to find the key in the hash table. If no match can be found, the variable rc will take on a nonzero value. rc=smq.find(key:llt_code); rc=smq.find_next(key:llt_code); This code selectively outputs only records that match a value from the hash table. Specifically, this outputs AEs whose llt_code is in the SMQ hash. A single AE may match more than one SMQ and will output multiple times with different lookup values. Structuring the data in this way will allow the use of bygroup processing to rapidly repeat similar tabulations for different search strategies. Alternate structure Once a signal has been found with an established search strategy for an AESI, the AESI may need to be flagged at the ADAE dataset level. A similar use of hash tables can make this possible for simple searches with one pass through the data and no sorting, simply by creating separate hash tables for each search element of the SIs that need to be flagged, at the corresponding key variable levels. Example code: Suppose two AESIs have been identified, AESI_A identified by a list of PTs and AESI_B by a combination search strategy of PTs and LLTs: data adae(drop=rc llt_code aellt aeseq recordposition); if _n_=1 then do; length llt_code $160 pt_name $480 FL_A $1 FL_B $1; declare hash a_pt (dataset: 'work.a_pt'); rc=a_pt.definekey('pt_name'); rc=a_pt.definedone(); declare hash b_pt (dataset: 'work.b_pt'); rc=b_pt.definekey('pt_name'); rc=b_pt.definedone(); declare hash b_llt (dataset: 'work.b_llt'); rc=b_llt.definekey('llt_code'); rc=b_llt.definedone(); call missing(fl_a, FL_B); set adae; rc=a_pt.find(key:pt_name); if rc=0 then fl_a= Y ; rc=b_pt.find(key:pt_name); if rc=0 then fl_b= Y ; else do; rc = b_llt.find(key: llt_code); if rc=0 then fl_b= Y ; run;

This will result in a copy of ADAE that has every record with a PT matching a_pt flagged with fl_a= Y and every record matching either PT in b_pt or LLT in b_llt flagged with fl_b= Y. Efficiency gain: When datasets are small, an n-fold efficiency gain is an interesting theoretical result. However, since AE search strategies are routinely applied across large safety databases or to pooled AE data from many different trials, a multiplicative gain in efficiency can mean the difference between getting same-day results or requiring a job to run overnight. We ran a simple example on a relatively small dataset comparing a SMQ hash lookup on LLTs to a single unsorted SQL join: Hash: real time cpu time Unsorted SQL join: real time cpu time 0.03 seconds 0.03 seconds 0.14 seconds 0.14 seconds CONCLUSION For the purpose of adverse event analysis, the use of hash tables can provide much faster results in comparison to SQL joins. The efficiency gain has been found to be considerably large. The paper described two methods to identify AESIs using hash tables while discussing how AEs that match with multiple SMQs can be identified. REFERENCES SAS Certification Prep Guide: Advanced Programming for SAS 9 http://www.pharmasug.org/proceedings/2013/bb/pharmasug-2013-bb01.pdf http://www.pharmasug.org/proceedings/2013/ho/pharmasug-2013-ho02.pdf ACKNOWLEDGMENTS We would like to acknowledge Rajeev Karanam for reviewing our paper and providing insightful comments. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the authors at: Name: Vinodita Bongarala Name: Liz Thomas Enterprise: Seattle Genetics, Inc. Enterprise: Seattle Genetics, Inc. Address: 21823-30th Drive S.E. Bothell, WA 98021 Address: 21823-30th Drive S.E. Bothell, WA 98021 Work Phone: 425-527-4288 Work Phone: 425-527-2370 E-mail: vbongarala@seagen.com E-mail: ethomas@seagen.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies.