Paper 146-29 A Group Scatter Plot with Clustering Xiaoli Hu, Wyeth Consumer Healthcare., Madison, NJ ABSTRACT In pharmacokinetic studies, abnormally high values of maximum plasma concentration Cmax of a drug observed in some individuals may cause concern as to potential safety issues. In a clinical program where many different formulations containing the same drug are tested in different studies, it is useful to be able to visually examine individual Cmax values across different formulations/studies. This paper provides the SAS codes to create such a graphic in rich text format (RTF) where the data for each formulation observed across studies are shown as separate clusters. INTRODUCTION In pharmacokinetic studies, the maximum plasma concentration of a given drug, Cmax is typically used to assess the rate of the drug s absorption. In addition, abnormally high values of Cmax observed in some individuals may signal some potential safety issues, especially when comparing across formulations containing the same drug. Therefore, it is useful to visually examine a scatter plot of such data collected from different studies using several formulations containing the drug. This paper provides the SAS codes to create such a presentation in rich text format (RTF). The average value of Cmax for each group of observations is also shown in the figure where the data for each formulation observed across studies are shown as separate clusters. SAS CODE AND PRESENTATION The data set, which created the graphic, has 4 variables, namely study number (1-5), subject number, formulation (1-4), and individual Cmax value of the drug. Here are some observations from the CMAX data set. Study Subject Rx Cmax 1 1001 3 11.86 1 1002 4 10.60 1 1003 4 6.73 1 1003 3 5.89 1 1023 3 9.20 1 1023 4 9.29 1 1024 3 12.52 1 1024 4 20.47 2 2001 1 11.26 2 2001 4 10.48 2 2002 1 10.15 2 2027 4 16.86 2 2027 1 18.78 2 2028 1 12.72 2 2028 4 10.16 3 3001 1 2.85 3 3001 3 5.21 3 3002 3 5.48 3 3002 1 5.32 3 3003 1 12.45 3 3003 3 10.70 The following graphic shows the individual CMAX values, as well as the group means by different study and different Formulation. Note that the two most extreme outliers are both from study3 and cautions against making any conclusions regarding the formulations without taking this into account. 1
Dot is the mean of concentration at each study and each formulation. The following SAS code was used to generate such a scatter plot. libname in '.'; proc format; value rx 1='formulation 1' 2='formulation 2' 3='formulation 3' 4='formulation 4'; value study 1='Study1' 2='Study2' 3='Study3' 4='Study4' 5='Study5'; /Create data set CMAX */ data cmax; infile 'cmax.txt' firstobs=2; input study subject $ rx cmax; label subject='subject id' study='study #' rx='formulation' cmax='cmax value'; /* Assign macro variable &delta value=0.2, which is the space between each subgroup (study) in the graphic */ %let delta=0.2; /* Start macro CMAX, which creates a graphic */ 2
%macro cmax; proc sort data=cmax out=cmax; /* Generate total number of formulations (&NUMRX) and number of studies within each formulation (&&SEQNM&i) */ data seq; set cmax(keep=rx study); if last.study; data seq; set seq end=eof; if first.rx then seq=1; else seq+1; if last.rx then call symput('seqnm' left(rx),compress(put(seq,8.))); if eof then call symput('numrx',compress(put(rx,8.))); /* Generate total number of studies (&NUMSTUDY) */ proc means data=seq noprint; var study; output out=study(drop=_type freq_) max=max; set study; call symput('numstudy',compress(put(max,8.))); /******************************************************************** Create subgroup for each study within each Formulation and create a group for each Formulation, where macro variable &delta=0.2, which is the space between each subgroup (study), in this example. *********************************************************************/ data a; merge cmax seq; data a(keep=subject cmax timept study); set a; %do i=1 %to &numrx; if rx=&i then do; %do k=1 %to &&seqnm&i; if seq=&k then timept=rx+(&k-&&seqnm&i/2-0.5)*δ end; proc sort data=a; by timept study; /* Generate mean of cmax for each study and formulation */ proc means data=a noprint; 3
by timept study; var cmax; output out=mean(drop=_type freq_) mean=m; /******************************************************************* Assign value to macro variable &stdnm for study name used as legend value description in Legend statement ******************************************************************/ do i=1 to &numstudy; call symput('stdnm' left(i), trim(left(put(i, study.)))); end; /******************************************************************* Assign value to macro variable &rxname for formulation name used as the mark values of axis1. *******************************************************************/ do i=1 to &numrx; call symput('rxname' left(i), trim(left(put(i, rx.)))); end; /* Assign value to macro variable &MAXY for maximum of CMAX */ proc means data=a noprint; var cmax; output out=maxy(drop=_freq type_) max=max; set maxy; *** since yaxis mark value by 5 ***; max=max/5; max=ceil(max)*5; call symput('maxy',compress(put(max,8.))); /* Set the graphics environment */ goptions ftext=courier htext=2 hby=0 gunit=pct display gsfmode=replace rotate=landscape device=png; /* Define axis characteristics */ axis1 order=(1 to &numrx by 1) origin=(15,15) offset=(10,10) minor=none width=2 value=(h=2 %do i=1 %to &numrx; tick=&i j=c "&&rxname&i" ) length=80 label=none; axis2 order=(0 to &maxy by 5) origin=(15,15) minor=none width=2 value=(h=2) length=85 label=(font=centb height=2.5 angle=90 "DPH Cmax (ng/ml)"); /* Define legend characteristics */ legend1 mode=protect origin=(15,1) shape=symbol(0.1,3) value=(h=2.0 %do i=1 %to &numstudy; tick=&i j=c "&&stdnm&i" ) frame label=none; /* Assign value to macro variable &symvalues (value of symbol) */ 4
%let symvalues=square diamond triangle circle +; /* Assign value to macro variable &symcolors (color of symbol) */ %let symcolors=red green blue lib black; /* Define symbol characteristics for scatter plot */ %do i=1 %to &numstudy; %let color&i=%scan(&symcolors, &i, %str( )); %let value&i=%qscan(&symvalues, &i, %str( )); symbol&i i=none v=&&value&i c=&&color&i h=5; /* Generate scatter plot as fig1 */ proc gplot data=a gout=jane; plot cmax*timept=study /legend=legend1 vaxis=axis2 haxis=axis1 noframe name='fig1'; /* Define symbol characteristics for mean plot */ %do i=1 %to &numstudy; %let color&i=%scan(&symcolors, &i, %str( )); symbol&i i=none v=dot c=&&color&i h=5; /* Generate mean plot as fig2 */ proc gplot data=mean gout=jane; plot m*timept=study /nolegend vaxis=axis2 haxis=axis1 noframe name='fig2'; /******************************************************************* ODS output the graphic as CMAX.RTF file. Let device=png in GOPTIONS. *******************************************************************/ ods listing close; ods rtf file="cmax.rtf" style=styles.newrtf; /* Replay fig1 and fig2 into template */ proc greplay nofs igout=jane; tc=temp; tdef al0 1 / llx=0 ulx=0 urx=100 lrx=100 lly=0 uly=100 ury=100 lry=0; template al0; tplay 1: 'fig1' 1: 'fig2'; ods rtf close; %mend cmax; /* Calling macro CMAX */ %cmax; CONCLUSIONS This paper provides the SAS code for scatter plot of Cmax values for different formulations of a drug, where the data for each formulation observed across studies are shown as separate clusters. The mean value for each vertical scatter is also shown in the plot. Although this particular example used Cmax data from several pharmacokinetic studies, the same concept may be used in other situations in which visual 5
inspection of individual data points is desired over more than one dimension, say for example by treatment group over each time point of observation. TRADEMARKS SAS, SAS/MACRO, and SAS/GRAPH are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. REFERENCE SAS/GRAPH, SAS/MACRO, Version 8, SAS institute Inc. (1999) Cary, NC, USA The author may be contacted at: Xiaoli Hu Wyeth Consumer Healthcare Five Giralda Farms Madison, NJ 07940 (973) 660-6547 E-mail: hux@wyeth.com 6