Lab CTCAE the Perl Way Marina Dolgin, Teva Pharmaceutical Industries Ltd., Netanya, Israel

Similar documents
The Power of Perl Regular Expressions: Processing Dataset-XML documents back to SAS Data Sets Joseph Hinson, inventiv Health, Princeton, NJ, USA

A Brief Tutorial on PERL Regular Expressions

Pharmaceuticals, Health Care, and Life Sciences

PharmaSUG Paper DS-24. Family of PARAM***: PARAM, PARAMCD, PARAMN, PARCATy(N), PARAMTYP

Global Checklist to QC SDTM Lab Data Murali Marneni, PPD, LLC, Morrisville, NC Sekhar Badam, PPD, LLC, Morrisville, NC

Deriving Rows in CDISC ADaM BDS Datasets

Amie Bissonett, inventiv Health Clinical, Minneapolis, MN

Perl Regular Expression The Power to Know the PERL in Your Data

Planning to Pool SDTM by Creating and Maintaining a Sponsor-Specific Controlled Terminology Database

ABSTRACT: INTRODUCTION: WEB CRAWLER OVERVIEW: METHOD 1: WEB CRAWLER IN SAS DATA STEP CODE. Paper CC-17

SAS automation techniques specification driven programming for Lab CTC grade derivation and more

PharmaSUG Paper PO21

Get SAS sy with PROC SQL Amie Bissonett, Pharmanet/i3, Minneapolis, MN

SAS Application to Automate a Comprehensive Review of DEFINE and All of its Components

PharmaSUG Paper DS06 Designing and Tuning ADaM Datasets. Songhui ZHU, K&L Consulting Services, Fort Washington, PA

PharmaSUG China Paper 70

The Benefits of Traceability Beyond Just From SDTM to ADaM in CDISC Standards Maggie Ci Jiang, Teva Pharmaceuticals, Great Valley, PA

CHAOTIC DATA EXAMPLES. Paper CC-033

Increase Defensiveness of Your Code: Regular Expressions

PharmaSUG Paper CC22

A Tool to Compare Different Data Transfers Jun Wang, FMD K&L, Inc., Nanjing, China

Doctor's Prescription to Re-engineer Process of Pinnacle 21 Community Version Friendly ADaM Development

Tips on Creating a Strategy for a CDISC Submission Rajkumar Sharma, Nektar Therapeutics, San Francisco, CA

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

PharmaSUG Paper PO10

How to validate clinical data more efficiently with SAS Clinical Standards Toolkit

PharmaSUG Paper AD03

Automation of makefile For Use in Clinical Development Nalin Tikoo, BioMarin Pharmaceutical Inc., Novato, CA

Standardization of Lists of Names and Addresses Using SAS Character and Perl Regular Expression (PRX) Functions

Automate Clinical Trial Data Issue Checking and Tracking

CDISC Variable Mapping and Control Terminology Implementation Made Easy

Hands-On ADaM ADAE Development Sandra Minjoe, Accenture Life Sciences, Wayne, Pennsylvania

SAS Online Training: Course contents: Agenda:

An Alternate Way to Create the Standard SDTM Domains

Study Data Reviewer s Guide Completion Guideline

%ANYTL: A Versatile Table/Listing Macro

START CONVERTING FROM TEXT DATE/TIME VALUES

Programming checks: Reviewing the overall quality of the deliverables without parallel programming

PhUSE US Connect 2019

Submission-Ready Define.xml Files Using SAS Clinical Data Integration Melissa R. Martinez, SAS Institute, Cary, NC USA

SDTM Automation with Standard CRF Pages Taylor Markway, SCRI Development Innovations, Carrboro, NC

SAS 9 Programming Enhancements Marje Fecht, Prowerk Consulting Ltd Mississauga, Ontario, Canada

When Powerful SAS Meets PowerShell TM

Not Just Merge - Complex Derivation Made Easy by Hash Object

There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA

It s All About Getting the Source and Codelist Implementation Right for ADaM Define.xml v2.0

PharmaSUG China

SAS Clinical Data Integration 2.4

Define.xml 2.0: More Functional, More Challenging

When ANY Function Will Just NOT Do

Making the most of SAS Jobs in LSAF

Application of Modular Programming in Clinical Trial Environment Mirjana Stojanovic, CALGB - Statistical Center, DUMC, Durham, NC

Common Programming Errors in CDISC Data

PharmaSUG Paper TT11

ODS/RTF Pagination Revisit

A Taste of SDTM in Real Time

Data Edit-checks Integration using ODS Tagset Niraj J. Pandya, Element Technologies Inc., NJ Vinodh Paida, Impressive Systems Inc.

Preparing the Office of Scientific Investigations (OSI) Requests for Submissions to FDA

PharmaSUG Paper CC02

Pharmaceuticals, Health Care, and Life Sciences. An Approach to CDISC SDTM Implementation for Clinical Trials Data

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

Implementation of Data Cut Off in Analysis of Clinical Trials

JMP Clinical. Getting Started with. JMP Clinical. Version 3.1

Codelists Here, Versions There, Controlled Terminology Everywhere Shelley Dunn, Regulus Therapeutics, San Diego, California

Hands-On ADaM ADAE Development Sandra Minjoe, Accenture Life Sciences, Wayne, Pennsylvania Kim Minkalis, Accenture Life Sciences, Wayne, Pennsylvania

Clinical Data Visualization using TIBCO Spotfire and SAS

PhUse Practical Uses of the DOW Loop in Pharmaceutical Programming Richard Read Allen, Peak Statistical Services, Evergreen, CO, USA

Harmonizing CDISC Data Standards across Companies: A Practical Overview with Examples

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

Creating output datasets using SQL (Structured Query Language) only Andrii Stakhniv, Experis Clinical, Ukraine

Dealing with changing versions of SDTM and Controlled Terminology (CT)

PhUSE EU Connect Paper PP15. Stop Copying CDISC Standards. Craig Parry, SyneQuaNon, Diss, England

PharmaSUG China Mina Chen, Roche (China) Holding Ltd.

CDISC SDTM and ADaM Real World Issues

Using GSUBMIT command to customize the interface in SAS Xin Wang, Fountain Medical Technology Co., ltd, Nanjing, China

Automation of SDTM Programming in Oncology Disease Response Domain Yiwen Wang, Yu Cheng, Ju Chen Eli Lilly and Company, China

PharmaSUG 2014 PO16. Category CDASH SDTM ADaM. Submission in standardized tabular form. Structure Flexible Rigid Flexible * No Yes Yes

PharmaSUG Poster PO12

Advanced Visualization using TIBCO Spotfire and SAS

Programmatic Automation of Categorizing and Listing Specific Clinical Terms

Working with Composite Endpoints: Constructing Analysis Data Pushpa Saranadasa, Merck & Co., Inc., Upper Gwynedd, PA

Applying ADaM Principles in Developing a Response Analysis Dataset

Introduction to ADaM and What s new in ADaM

Cody s Collection of Popular SAS Programming Tasks and How to Tackle Them

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

Practical Uses of the DOW Loop Richard Read Allen, Peak Statistical Services, Evergreen, CO

Bad Date: How to find true love with Partial Dates! Namrata Pokhrel, Accenture Life Sciences, Berwyn, PA

Traceability in the ADaM Standard Ed Lombardi, SynteractHCR, Inc., Carlsbad, CA

Are you Still Afraid of Using Arrays? Let s Explore their Advantages

A Macro to Create Program Inventory for Analysis Data Reviewer s Guide Xianhua (Allen) Zeng, PAREXEL International, Shanghai, China

SDTM Implementation Guide Clear as Mud: Strategies for Developing Consistent Company Standards

Implementing CDISC Using SAS. Full book available for purchase here.

ABSTRACT INTRODUCTION WHERE TO START? 1. DATA CHECK FOR CONSISTENCIES

How to write ADaM specifications like a ninja.

When ANY Function Will Just NOT Do

REPORT DOCUMENTATION PAGE

Customized Flowcharts Using SAS Annotation Abhinav Srivastva, PaxVax Inc., Redwood City, CA

Keeping Track of Database Changes During Database Lock

PharmaSUG2014 Paper DS09

Best Practice for Explaining Validation Results in the Study Data Reviewer s Guide

Transcription:

PharmaSUG 2016 - Paper QT13 Lab CTCAE the Perl Way Marina Dolgin, Teva Pharmaceutical Industries Ltd., Netanya, Israel ABSTRACT The NCI Common Terminology Criteria for Adverse Events (CTCAE) is a descriptive terminology which is utilized for Adverse Event (AE) reporting. A toxicity grading scale is provided for each AE term, it varies from 1 (mild) to 5 (death). Typically, CTCAE grading is directly collected from the site on the adverse experience case report form. However, this may not be the case for laboratory results. Usually, the calculation of CTCAE grading for laboratory results would require an explicit coding of each criteria, per each lab test, by means of "If" and "Else" statements. This paper demonstrates a method (macro) for calculation of toxicity grade based on laboratory value and original NCI CTCAE grading file. This macro is using Perl Regular Expressions patterns and functions, which is much more compact solution to a complicated hard coding task and also eliminates a need for per test programming of the toxicity grades. Lastly, macro may be used with both ADaM and SDTM structures. INTRODUCTION The Common Terminology Criteria for Adverse Events (CTCAE) are a set of criteria for recognition and grading toxicity of adverse events (AE). The CTCAE system is a product of the US National Cancer Institute (NCI). It uses a range of grades from 1(Mild Adverse Event) to 5 (Death due to Adverse Event), when grade 0 considered as normal condition. For adverse events CTCAE toxicity grading is directly collected from the site on the case report form (CRF). However, for laboratory values toxicity grading is assigned in later step by a clinical programmer. CTCAE v4.0 Term Grade 1 Grade 2 Grade 3 Grade 4 Anemia Hemoglobin (Hgb) Hgb <10.0-8.0 g/dl; Hgb <8.0 g/dl; <LLN - 10.0 g/dl; <6.2-4.9 mmol/l; <4.9 mmol/l; <80 g/l; <LLN - 6.2 mmol/l; <100-80g/L transfusion indicated <LLN - 100 g/l Blood bilirubin increased Life-threatening consequences; urgent intervention indicated >ULN - 1.5 x ULN >1.5-3.0 x ULN >3.0-10.0 x ULN >10.0 x ULN Lymphocyte count <LLN - 800/mm3; <LLN - 0.8 x 10e9/L <800-500/mm3; <0.8-0.5 x 10e9 /L <500-200/mm3; <0.5-0.2 x 10e9 /L <200/mm3; <0.2 x 10e9 /L Table 1. Example of CTCAE v4.03. Usually, the calculation of CTCAE grading for laboratory results would require an explicit coding of each criteria, per each lab test, as shown below for "Lymphocyte" test: if lbtestcd ='LYMPH' and upcase(lbstresu)='10e9/l' then do; if 0.8<=lbstresn<lbstnrlo then lbtoxgr='1'; else if 0.5<=lbstresn<0.8 then lbtoxgr ='2'; else if 0.2<=lbstresn<0.5 then lbtoxgr ='3'; else if.z<lbstresn<0.2 then lbtoxgr ='4'; else if lbstresn>.z then lbtoxgr ='0'; 1

Some limitations of explicit coding are: 1. Manual coding of all grades criteria found in CTCAE file, per test, per different units. 2. Increased chances for typo errors during hard coding. 3. The code should be revised in order to match different CTCAE versions. The %lb_ctcae macro is using Perl Regular Expressions patterns and functions that allow to locate patterns in text strings i.e. the toxicity criteria and to compare each grade criteria with laboratory value in the data. Advantages of %lb_ctcae macro: 1. No hard coding of grades per each test and units. 2. Values are extracted directly from original CTCAE file. 3. No modification is needed for different CTCAE MedDRA version (4.x & 3.x). 4. Might be used with both ADaM and SDTM structures. CTCAE_v3.xls and CTCAE_v4.xls files are available on NCI website. As CTCAE terms do not always directly indicate the laboratory test to which the grades apply there is a need to assign a testcode to each criteria before working with that file. In addition transposition of the file by testcode and grade is performed. These operations are done only once when a new CTCAE MedDRA version is released. MACRO DESIGN %lb_ctcae(ctc=, datain=, dataout=, tests=, isadam=) &ctc -CTCAE.xls file location &datain -input LAB dataset &dataout -output dataset &tests tests of interest for which grade is calculated &isadam 'Y' for ADaM structure; else SDTM. PREPROCESSING STEP As macro works on both ADaM and SDTM structure, the correct variables from data should be used. %if &isadam="y" %then %do; %let lab_value =aval; %let high_lim =a1hi; %let low_lim =a1lo; % %else %do; %let lab_value =lbstresn; %let high_lim =lbstnrhi; %let low_lim =lbstnrlo; % In first step only tests of interest are taken from laboratory data. In addition, the sequence variable SEQ is created. data lb_test; set &datain.; if testcode in (&tests.); seq =_n_; run; Subsequent merging of metadata file CTCAE.xls with laboratory data is performed. The merging is implemented by testcodes -found in both metadata and lab data. 2

EXTRACTION OF THE RANGES FROM CTCAE FILE The regular expression is a sequence of characters that define a search pattern. Once a pattern is found, it's possible to obtain the position of the pattern, extract a substring, or substitute a string. There are two main patterns of toxicity ranges in CTCAE guide. The first pattern contains units (e.g. "Anemia"); the second pattern is a product of the numerical coefficient and upper (ULN) or lower (LLN) limit of the normal range, without units (e.g. "Blood bilirubin increased ", Table 1). RANGES WITH UNITS An extraction of the ranges for laboratory tests based on standard units found in data (e.g. Hemoglobin in g/l). data CTCAE_lab; set CTCAE_lab; if lbstresu ^= '' then do; pattern = CAT("#(< >)[a-z0-9-.,]+",lbstresu,"#i"); /*Note 1*/ ptrn_ = PRXPARSE(pattern); CALL PRXSUBSTR(ptrn_,descrp,start,length); range_ = SUBSTRN(descrp,start,length); Note 1. Short explanation of regular expression pattern: Hash sign "#" is the delimiter that's specifying a pattern to search. The pattern starts with either less or greater symbol, followed by "[a-z0-9-.,]" - any letter or number, might contain a point or a comma, this part can be repeated one or more time - "+". The CAT function adds standard units from the data to the pattern. The "I" after the final delimiter is used in order to perform case-insensitive search. The PRXPARSE function is used to create a regular expression. The PRXSUBSTR call finds the correct range, the length of it and starting position; subsequent SUBSTRN function extracts the actual range from the text string. At this point RANGE_ variable contains correct toxicity grade limits based on units that are found in data. Next step would be removal of units from the RANGE_ variable. pattern2= CAT("s#(x?", lbstresu,")# #I"); /*Note 2*/ ptrn = PRXPARSE(pattern2); range = PRXCHANGE(ptrn,-1,range_); Note 2. Short explanation of regular expression pattern2: This pattern is used with PRXCHANGE function in order to remove units from range. The "s" in the beginning of the pattern specifies the substitution operator. The pattern itself might "?" contain the multiply symbol "x" and is concatenated with standard units. "# #" Is the substitute portion of the regular expression, it's blank and therefore PRXCHANGE function will remove the identified pattern (standard units) from the string. PRXCHANGE function substitutes units in RANGE_ with blank. The last step could be possibly be done by usage of e.g. TRANWRD function; however PRXCHANGE together with regular expression allow to cover cases where "x" sign is found in the grade description before units like in " " (Table 2). At the end of this step, the RANGE variable contains grade toxicity limits without units. 3

Testcode ADVERSE_EVENT GRADE GRADE_DESCRIPTION Standard Units RANGE_ RANGE T046 Anemia 1 Hemoglobin(Hgb) <LLN-10.0g/dL; <LLN-6.2mmol/L;<LLN-100g/L T046 Anemia 2 Hgb<10.0-8.0g/dL; <6.2-4.9mmol/L;<100-80g/L g/l <LLN-100g/L <LLN-100 g/l <100-80g/L <100-80 T046 Anemia 3 Hgb<8.0-6.5g/dL; <4.9-4.0mmol/L; <80-65g/L;transfusion indicated g/l <80-65g/L <80-65 T046 Anemia 4 Hgb<6.5g/dL;<4.0mmol/L;<65g/L; g/l <65g/L <65 1 <LLN - 800/mm3; <LLN - 0.8 x 10e9/L 10E9/L <LLN- 0.8x10e9/L <LLN-0.8 2 <800-500/mm3; <0.8-0.5 x 10e9 /L 10E9/L <0.8-0.5x10e9/L <0.8-0.5 3 <500-200/mm3; <0.5-0.2 x 10e9 /L 10E9/L <0.5-0.2x10e9/L <0.5-0.2 4 <200/mm3; <0.2 x 10e9 /L 10E9/L <0.2x10e9/L <0.2 Table 2. Extraction of the ranges based on standard units. RANGES WITHOUT UNITS For criteria that are based on the products of upper or lower bound of the normal range next step is being applied in order to calculate toxicity ranges (see Table 3). Here the multipliers are of interest, therefore first step will be detection of multiplier by regular expression pattern, and then multiplication of ULN/LLN by detected value. %let digit = \d+\.?\d*; /* multiplier digit */ data CTCAE_lab2; set CTCAE_lab; retain pattern3 pattern4 pattern5; if _N_ = 1 then do; pattern3 = PRXPARSE("#>(ULN)-(&digit.)x(ULN)#"); /*Note 3*/ pattern4 = PRXPARSE("#>(&digit.)-(&digit.)x(ULN)#"); pattern5 = PRXPARSE("#>(&digit.)x(ULN)#"); if find (descrp, 'x') > 0 then do; if PRXMATCH(pattern3,descrp) >0 then do; CALL PRXPOSN(pattern3,2,start,length); range = CATS(">",&high_lim., "-", SUBSTR(descrp,start,length)* &high_lim.) ; else if PRXMATCH(pattern4,descrp) >0 then do; CALL PRXPOSN(pattern4,1,start,length); CALL PRXPOSN(pattern4,2,start2,length2); range = CATS (">",SUBSTR(descrp,start,length)*&high_lim., "-", SUBSTR(descrp,start2,length2)*&high_lim.); else if PRXMATCH(pattern5,descrp) >0 then do; CALL PRXPOSN(pattern5,1,start,length); range = CATS(">",SUBSTR(descrp,start,length)*&high_lim.); run; The same approach applies also to LLN values. 4

Note 3. Short explanation of regular expression pattern3: &digit -is the multiplying value. The pattern starts with greater symbol, followed by "(ULN)-(&digit.)x(ULN)". Each set of parentheses inside the regular expression represents a subexpression. The PRXPOSN function identifies position and length of a subexpression defined in the regular expression. As the purpose of this step is to identify multiplier value, the number of subexpression to be evaluated by PRXPOSN is 2. For pattern4 PRXPOSN uses 1 & 2 subexpressions and for pattern5-1 subexpression. At the end of this step, the RANGE variable contains grade toxicity limits after multiplication of ULN from laboratory data by multiplier from GRADE_DESCRIPTION variable. Lab. Code ADVERSE_EVENT GRADE GRADE_DESCRIPTION Reference Range Upper Limit-Std Units RANGE T506 Blood bilirubin increased 1 >ULN-1.5xULN 17.1 >17.1-25.65 T506 Blood bilirubin increased 2 >1.5-3.0xULN 17.1 >25.65-51.3 T506 Blood bilirubin increased 3 >3.0-10.0xULN 17.1 >51.3-171 T506 Blood bilirubin increased 4 >10.0xULN 17.1 >171 Table 3. Calculation of the ranges without standard units. TOXICITY GRADE CALCULATION Finally, identification of toxicity grade by comparison of laboratory result value with each grade limits is performed. In general, each RANGE might be presented by one of the four possible patterns, PATTERN3- PATTERN6 below. After detection of the pattern, lower and upper bound values are being extracted from RANGE by PRXPOSN call, comparison of laboratory value (AVAL or LBSTRESN) to the limits is performed. 5

data CTCAE_lab3; set CTCAE_lab2; if _N_ = 1 THEN do; PATTERN3 = PRXPARSE("#>(&digit.)(-)(&digit.)#"); /* >x - y */ PATTERN4 = PRXPARSE("#<(&digit.)(-)(&digit.)#"); /* <x - y */ PATTERN5 = PRXPARSE("#<(&digit.)#"); /* <x */ PATTERN6 = PRXPARSE("#>(&digit.)#"); /* >x */ retain PATTERN:; if PRXMATCH(PATTERN3,range) >0 then do; CALL PRXPOSN(PATTERN3,3,start3,length3); upper_level = SUBSTR(range,start3,length3); upper_level2 = input (upper_level,best.); CALL PRXPOSN(PATTERN3,1,start3,length3); if low_level2 <&lab_value.<= upper_level2 then output; /* >x - y */ else if PRXMATCH(PATTERN4,range) >0 then do; CALL PRXPOSN(PATTERN4,1,start3,length3); upper_level = SUBSTR(range,start3,length3); upper_level2 = input (upper_level,best.); CALL PRXPOSN(PATTERN4,3,start3,length3); if low_level2 <=&lab_value.< upper_level2 then output; /* <x - y */ else if PRXMATCH(PATTERN5,range) >0 then do; CALL PRXPOSN(PATTERN5,1,start3,length3); if low_level2 > &lab_value. then output; /* <x */ else if PRXMATCH(PATTERN6,range) >0 then do; CALL PRXPOSN(PATTERN6,1,start3,length3); if low_level2 < &lab_value. then output; /* >x */ run; MACRO OUTPUT The last dataset contains only records that matched the PATTERN3- PATTERN6 i.e. have any toxicity grade. Those records are merged back to the original laboratory dataset by sequence variable, created in data preprocessing step. In addition, tests of interest, that don't have toxicity value, are getting grade "0". The final dataset &dataout has two new variables: LBTOXGR (toxicity grade, e.g. "2") and LBTOX (AE description, e.g. "Anemia"). CONCLUSION The %lb_ctcae is using Perl Regular Expressions patterns and functions, and provide compact solution to a complicated hard coding task and eliminates a need for per test programming of the toxicity grades. In contrast to hard coding of toxicity grading, which would require a separate coding for each CTCAE version, the macro utilizes original CTCAE MedDRA file and therefore may be applied to different CTCAE versions without any code modifications. In addition, it designed to be used with both ADaM and SDTM structures. 6

REFERENCES National Cancer Institute, Common Terminology Criteria for Adverse Events (CTCAE) v4.03. evs.nci.nih.gov/ftp1/ctcae/ctcae_4.03_2010-06-14.xls Ron Cody, "An Introduction to Perl Regular Expressions in SAS 9", Robert Wood Johnson Medical School, Piscataway, NJ, SUGI29, Available at http://www2.sas.com/proceedings/sugi29/265-29.pdf ACKNOWLEDGMENTS I thank Svetlana Rubinchick for being my reviewer and offering great feedback. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Name: Marina Dolgin Enterprise: Teva Pharmaceutical Industries Ltd. Address: Hatrufa Street 12 City, State ZIP: Netanya 42504, Israel E-mail: marina.dolgin@teva.co.il Web: www.tevapharm.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are trademarks of their respective companies. 7