Inter-Rater Agreement in Medical Studies: Analysis Tools within the SAS System

Similar documents
Teaching statistics and the SAS System

SAS/STAT 13.1 User s Guide. The NESTED Procedure

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Graphical Techniques for Displaying Multivariate Data

R FUNCTIONS IN SCRIPT FILE agree.coeff2.r

Biostat Methods STAT 5820/6910 Handout #4: Chi-square, Fisher s, and McNemar s Tests

Picturing Statistics Diana Suhr, University of Northern Colorado

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

The NESTED Procedure (Chapter)

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95

Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research

Chapter 1 Introduction. Chapter Contents

A Generalized Procedure to Create SAS /Graph Error Bar Plots

Online Supplementary Appendix for. Dziak, Nahum-Shani and Collins (2012), Multilevel Factorial Experiments for Developing Behavioral Interventions:

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Contents of SAS Programming Techniques

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

An introduction to SPSS

Applied Regression Modeling: A Business Approach

SAS CLINICAL SYLLABUS. DURATION: - 60 Hours

STA 570 Spring Lecture 5 Tuesday, Feb 1

Displaying Multiple Graphs to Quickly Assess Patient Data Trends

Binary Diagnostic Tests Clustered Samples

Two useful macros to nudge SAS to serve you

Using PROC SQL to Generate Shift Tables More Efficiently

Statistical Pattern Recognition

A SAS Macro for measuring and testing global balance of categorical covariates

SAS Training BASE SAS CONCEPTS BASE SAS:

Multivariate Capability Analysis

Multiple Regression White paper

Choosing the Right Procedure

VALIDITY OF 95% t-confidence INTERVALS UNDER SOME TRANSECT SAMPLING STRATEGIES

A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN

STAT 5200 Handout #25. R-Square & Design Matrix in Mixed Models

Statistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment

Bland-Altman Plot and Analysis

Module 3: SAS. 3.1 Initial explorative analysis 02429/MIXED LINEAR MODELS PREPARED BY THE STATISTICS GROUPS AT IMM, DTU AND KU-LIFE

Windows Application Using.NET and SAS to Produce Custom Rater Reliability Reports Sailesh Vezzu, Educational Testing Service, Princeton, NJ

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Cluster Randomization Create Cluster Means Dataset

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Ranking Between the Lines

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by

Statistics and Data Analysis. Paper

Intermediate SAS: Statistics

Cut Out The Cut And Paste: SAS Macros For Presenting Statistical Output ABSTRACT INTRODUCTION

8. MINITAB COMMANDS WEEK-BY-WEEK

MAT 110 WORKSHOP. Updated Fall 2018

Workload Characterization Techniques

Statistical Pattern Recognition

Simulating Multivariate Normal Data

Statements with the Same Function in Multiple Procedures

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

SAS PROC GLM and PROC MIXED. for Recovering Inter-Effect Information

Chapter 13 Multivariate Techniques. Chapter Table of Contents

Quantitative - One Population

Product Catalog. AcaStat. Software

Data Annotations in Clinical Trial Graphs Sudhir Singh, i3 Statprobe, Cary, NC

The DMSPLIT Procedure

Using Templates Created by the SAS/STAT Procedures

SAS Macros CORR_P and TANGO: Interval Estimation for the Difference Between Correlated Proportions in Dependent Samples

Coders' Corner. Paper ABSTRACT GLOBAL STATEMENTS INTRODUCTION

Robust Linear Regression (Passing- Bablok Median-Slope)

Using SAS Macro to Include Statistics Output in Clinical Trial Summary Table

Creating Forest Plots Using SAS/GRAPH and the Annotate Facility

For example, the system. 22 may be represented by the augmented matrix

Paper SDA-11. Logistic regression will be used for estimation of net error for the 2010 Census as outlined in Griffin (2005).

CLUSTER ANALYSIS. V. K. Bhatia I.A.S.R.I., Library Avenue, New Delhi

SAS data statements and data: /*Factor A: angle Factor B: geometry Factor C: speed*/

Statistical Pattern Recognition

Introduction to Statistical Analyses in SAS

Multiple Graphical and Tabular Reports on One Page, Multiple Ways to Do It Niraj J Pandya, CT, USA

Analysis of Variance in R

The STANDARD Procedure

The EMCLUS Procedure. The EMCLUS Procedure

Introduction to SAS. I. Understanding the basics In this section, we introduce a few basic but very helpful commands.

An Efficient Method to Create Titles for Multiple Clinical Reports Using Proc Format within A Do Loop Youying Yu, PharmaNet/i3, West Chester, Ohio

IAT 355 Visual Analytics. Data and Statistical Models. Lyn Bartram

SPSS INSTRUCTION CHAPTER 9

Response to API 1163 and Its Impact on Pipeline Integrity Management

Exploring and Understanding Data Using R.

A SAS Macro to Generate Caterpillar Plots. Guochen Song, i3 Statprobe, Cary, NC

DATA CLASSIFICATORY TECHNIQUES

Acquisition Description Exploration Examination Understanding what data is collected. Characterizing properties of data.

Automating the Plotting of Significant Effects in Analysis of Variance

MIXED_RELIABILITY: A SAS Macro for Estimating Lambda and Assessing the Trustworthiness of Random Effects in Multilevel Models

Laboratory for Two-Way ANOVA: Interactions

Getting to Know Your Data

Clustering and Visualisation of Data

Paper Abstract. Introduction. SAS Version 7/8 Web Tools. Using ODS to Create HTML Formatted Output. Background

Generalized Least Squares (GLS) and Estimated Generalized Least Squares (EGLS)

Applied Multivariate Analysis

Poisson Regressions for Complex Surveys

Module I: Clinical Trials a Practical Guide to Design, Analysis, and Reporting 1. Fundamentals of Trial Design

Chapter 13 Introduction to Graphics Using SAS/GRAPH (Self-Study)

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Lab #9: ANOVA and TUKEY tests

A SAS and Java Application for Reporting Clinical Trial Data. Kevin Kane MSc Infoworks (Data Handling) Limited

An Application of PROC NLP to Survey Sample Weighting

2 = Disagree 3 = Neutral 4 = Agree 5 = Strongly Agree. Disagree

Transcription:

Inter-Rater Agreement in Medical Studies: Analysis Tools within the SAS System Dipl.Ing. Dr. Barbara Schneider, University of Vienna, Austria Abstract In this presentation the intraclass correlation coefficient, a graphical technique to check agreement visually and the κ-measure are discussed. Emphasis is laid on how these statistical methods and techniques can be performed within the SAS System and if they are not available within SAS Software how they can be implemented. Data from clinical trials are presented. SAS code is reported using the following tools: PROC GLM, PROC GPLOT, PROC FREQ, Macro Language and SAS/IML. Introduction Measurement in medical studies is very often an imprecise task and the errors associated with this should be quantified and understood. Since measurement error can seriously affect statistical analyses and interpretation, it is important to assess the amount of such error by calculating a riability index. To investigate agreement in continuous variables a graphical technique proposed by Bland and Altman (1986) and the intraclass correlation coefficient are used. For categorial data the κ-measure is appropriate. Studies comparing two or more methods are common. The aim of these studies is to see if the methods agree well enough for one method to replace the others or for the methods to be used interchangeably. The same considerations apply to studies comparing two or more observers using one method. An important aspect of method or observer comparison is the comparison of the repeatability of each observer or method. Graphical Technique The inter-observer (within observer or between methods) agreement is demonstrated by a graphical technique suggested by Bland and Altman (Lancet, Feb. 8, 1989).

In order to assess inter-observer agreement graphically, the difference between the two measurements of each of the two observers are plotted against the mean of the two values of the observers for each patient. Limits of agreement, defined as twice the standard deviation of the difference between the observers, are calculated and plotted in the figure. If we suppose that these differences follow a normal distribution, 95% of the differences will lie between these limits. To demonstrate the procedure, data from Altman (1991) are used comparing two measurement methods. Measurements of transmitral volumetric flow (MF) by Doppler echocardiography and left ventricular stroke volume (SV) by crosssectional echocardiography in 21 patients without aortic valve disease are shown. The following program implements this technique within the SAS System and produces this plot: DIFF 20 10 0-10 -20 40 50 60 70 80 90 100 110 120 130 140 MEAN

goptions reset=(axis, legend, pattern, symbol, title, footnote) norotate hpos=0 vpos=0 htext= ftext= ctext= target= gaccess= gsfmode= ; title; data ms; input patient mf sv; cards; 1 47 43 2 66 70 3 68 72 4 69 81 5 70 60 6 70 67 7 73 72 8 75 72 9 79 92 10 81 76 11 85 85 12 87 82 13 87 90 14 87 96 15 90 82 16 100 100 17 104 94 18 105 98 19 112 108 20 120 131 21 132 131 ; proc print data=o; data o; set o; d1=m_diff+2*s_diff; d2=m_diff-2*s_diff; call symput("m1",m_diff); call symput("d1",d1); call symput("d2",d2); symbol1 c=black v=circle i=none l=1; symbol2 c=black v=none i=join l=1; proc gplot data=temp; plot diff*mean=1 r*mean=2 / overlay vref= &m1 &d1 &d2 lvref=2 cvref=black caxis=black vaxis=&l to &u by &s ; title; %mend p; %p(mf,sv,-20,20,10); quit; %macro p(x,y,l,u,s); data temp; set ms; mean=mean(&x,&y); diff=&y-&x; r=0; /*in order to show perfect agreement: diff=0 */ proc univariate data=temp normal; var mean diff ; output out=o mean= m_mean m_diff std= s_mean s_diff ;

Intraclass correlation coefficient To quantify agreement the intraclass correlation coefficient (ICC) can be used for continuous variables. Roughly speaking the ICC can be viewed as a ratio of the variance of interest over the sum of the variance of interest plus error. Analysis of variance models are applied to calculate the ICC. To estimate the repeatability of each method or observer a one-way random effect model is performed. The resulting variance components are taken to calculate the ICC. Let x ij denote the ith rating (repetition) (i=1,...,r) on the jth target (e.g. patient) (j=1,...,n). Then the following linear model is assumed : x ij = µ + b j + e ij. In this equation, the component µ is the overall population mean of the ratings. b j is the difference from µ of the mean across the repeated jth target ratings. e ij represents the residual component. The component b j is assumed to vary normally with a mean of zero and a variance of σ T 2. It is also assumed that the e ij terms are distributed independently and normally with a mean of zero and a variance of σ E 2. The ICC takes the form of the following ratio: ICC = σ T 2 / (σ T 2 + σ E 2 ) To estimate the ICC variance components have to be estimated. PROC VARCOMP within the SAS System computes the necessary variance components but unfortunately does not provide an output data set. Therefore PROC GLM has to be used. The next program shows how to produce an estimate of the ICC within the SAS System.

data msu ; set libname.name; %macro i(t,y); data f ; keep t y; set msu; y=&y; /* the variable containing the scores of interest is mapped to an internal variable y */ t= &t; /* the variable containing targets (random effect ) information is mapped to t*/ proc glm data=f outstat=s1 noprint; class t ; model y= t ; random t ; data s1; set s1 ; mss=ss/df; if _TYPE_="SS3" or _TYPE_="ERROR"; proc transpose data=s1 out=o1 prefix=ms_; var mss; id _SOURCE_; proc transpose data=s1 out=o2 prefix=df_; var df; id _SOURCE_; data icc1; merge o1 o2; r_1=(df_error)/(df_t+1); r=r_1+1; vare=ms_error; vart=(ms_t-vare)/r; ICC=vart/(vart+vare); title2 "estimation of the ICC - scoring variable : &y"; proc print data=icc1 noobs; var icc; title2; %mend i; %i(tname,varname); Investigating the agreement between two or more observers or methods a twoway random effects model is appropriate. Let x ijl denote the ith rating (repetition) (i=1,...,r) on the jth target (e.g. patient) (j=1,...,n) of the lth observer (method) (l=1,...,k). Then the following linear model is assumed : x ijl = µ + a l + b j +(ab) lj + e ijl. In this equation, the component µ is the overall population mean of the ratings. a l is the difference from µ of the mean of the lth judge s ratings. b j is defined as before. (ab) lj is the interaction term between judges and targets. e ijl represents the error term. The component a l is assumed to vary normally with a mean of zero

and a variance of σ 2 J. b j is assumed to vary normally with a mean of zero and a variance of σ 2 T. (ab) lj are assumed to be mutually independent with a mean of zero and a variance of σ 2 I It is also assumed that e ijl are distributed independently and normally with a mean of zero and a variance of σ 2 E. Then the ICC takes the form of the following ratio: ICC = σ T 2 / (σ T 2 +σ J 2 + σ I 2 + σ E 2 ) In the absence of repeated ratings by each judge on each target, the interaction and error components cannot be estimated separately. This situation will be taken into account using the program statement if ms_error=. then ms_error=0; The following program shows how to produce an estimate of the ICC within the SAS System : data msu ; set libname.name; %macro i(t,j,y); data f ; keep t j y; set msu; y=&y; t=&t; j=&j; /* variable containing judge information */ proc glm data=f outstat=s2 ; class t j; model y= j t t*j; random t j t*j ; data s2; set s2 ; mss=ss/df; if _TYPE_="SS3" or _TYPE_="ERROR"; proc transpose data=s2 out=o3 prefix=ms_; var mss; id _SOURCE_; id _SOURCE_; data icc2; merge o3 o4; n=df_t+1; k_1=df_j; k=df_j+1; r=(1+df_error/(n*k)); if ms_error=. then ms_error=0; /* formal reasons*/ vare=ms_error; vari=(ms_t_j-vare)/r; varj=(ms_j-ms_t_j)/(r*n); vart=(ms_t-ms_t_j)/(r*k); icc=vart/(vart+varj+vari+vare); title2 estimation of the ICC - scoring variable : &y ; proc print data=icc2 noobs; title2; %mend i; %i(tname,jname,varname); proc transpose data=s2 out=o4 prefix=df_; var df;

Example As before data from measurements of transmitral volumetric flow (MF) by Doppler echocardiography and left ventricular stroke volume (SV) by crosssectional echocardiography in 21 patients without aortic valve disease are taken to estimate the ICC. This is a method comparison analysis. For each patient (target) and per method (judge) only one measurement is taken. In this situation the interaction and the error component cannot be estimated separately as stated above. The dataset ms is multivariate structured. To make the information stored in ms available for computing the ICC it has to be reshaped: data msu ; keep patient methode value; set ms; methode=1; value=mf; output; methode=2; value=sv; output; Running the macro: %i(patient,methode,value); produces the output: estimation of the ICC - scoring variable : value ICC 0.94621 κ-measure

Agreement between categorial assessments is usually considered as a problem of comparing the ability of different raters (observers, judges) to classify subjects (targets) into one of several groups. The investgations outlined below will also apply to method comparison studies for catigorial data; studies that compare two alternative categorization schemes. Categorial data can be formed in a contingency table where the row levels represent the ratings of one observer (method) and the column levels the ratings of onother observer (method). Suppose p ij is the probability of a target being classified in the ith category by n the first observer and the jth category by the second observer. Then p 0 = is p ii i = 1 the probability that the observers agree. If the ratings were independent, the n probability of agreement would be p e = p p. Thus p 0 - p e is the amount of i= 1 i.. i agreement beyond that expected by chance. The κ-measure (κ coefficient) is defined as: κ = p p 0 e. The κ - measure of agreement can be interpreted as 1 pe the chance corrected proportional agreement. A weakness of the κ is that it takes no account of the degree of disagreement i.e. all disagreements are treated (weighted) equally. When the categories are ordered (ordinal scaled data), it is sometimes preferable to put different weights to disagreements according to the magnitude of discrepancy. This idea leads to the so called weighted κ. Weighted κ is obtained by weighting the frequencies in each cell of the table according to their distance from the diagonal that indicates agreement. For a detailed discussion see e.g. Fleiss (1981). PROC FREQ within the SAS System can be used to compute the κ measures. Example Data show the classification by two radiologists of 85 xeromammograms. 0 corresponds to normal, 1 to benign diseas, 2 to suspicion of cancer and 3 to cancer.

/* For n*n Tables */ /* Simple and Weighted kappa */ /* Altman p.403 Bsp. Tab.14.2 */ data k1; input x y weight; cards; 0 0 21 0 1 12 0 2 0 0 3 0 1 0 4 1 1 17 1 2 1 1 3 0 2 0 3 2 1 9 2 2 15 2 3 2 3 0 0 3 1 0 3 2 0 3 3 1 ; proc freq data=k1; tables x*y / agree; weight weight; This results in the output: TABLE OF X BY Y X Y Frequency Percent Row Pct Col Pct 0 1 2 3 Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 0 21 12 0 0 33 24.71 14.12 0.00 0.00 38.82 63.64 36.36 0.00 0.00 75.00 31.58 0.00 0.00 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 1 4 17 1 0 22 4.71 20.00 1.18 0.00 25.88 18.18 77.27 4.55 0.00 14.29 44.74 6.25 0.00 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 2 3 9 15 2 29 3.53 10.59 17.65 2.35 34.12 10.34 31.03 51.72 6.90 10.71 23.68 93.75 66.67 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ 3 0 0 0 1 1 0.00 0.00 0.00 1.18 1.18 0.00 0.00 0.00 100.00 0.00 0.00 0.00 33.33 ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 28 38 16 3 85 32.94 44.71 18.82 3.53 100.00 STATISTICS FOR TABLE OF X BY Y Test of Symmetry ---------------- Statistic = 15.400 DF = 6 Prob = 0.017 Kappa Coefficients Statistic Value ASE 95% Confidence Bounds ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Simple Kappa 0.473 0.073 0.330 0.615 Weighted Kappa 0.568 0.068 0.436 0.701 Sample Size = 85 The following IML code shows an alternative approach for computing the κ - measure including estimates of the standard error and 95% confidence limits. Comparing the IML approach to PROC FREQ it provides more flexibility in specifying weights. Another advantage is that the IML program is not restricted to n*n tables. If it seems to make sense r*s tables can be used. Restrictions for data entry are that the values of the score variables have to range from 0 to n and the dimension of the smallest squared table in which the resulting r*s table fits has to be specified. A detailed description of the mathematical formulas used in the IML statements is given in Fleiss (1981). /* Not only for n*n Tables */ /* Weighted kappa */ /* Altman S.403 Example Tab.14.2 */ %let dim=4; /* weights for simple kappa*/ %let w=%str( {1 0 0 0, 0 1 0 0, 0 0 1 0, 0 0 0 1} ); data index; do ind1=1 to &dim; do ind2=1 to &dim; output; end; end; data k1; input x y weight; cards; 0 0 21 0 1 12 0 2 0 0 3 0 1 0 4 1 1 17 1 2 1 1 3 0 2 0 3 2 1 9 2 2 15 2 3 2 3 0 0 3 1 0 3 2 0 3 3 1 ; data k; set k1; drop weight i; do i=1 to weight; output; end; proc freq data=k; This results in the output: tables y*x / out=m1 noprint; proc sort data=m1; by x y; data m1; set m1; ind1=x+1; ind2=y+1; data m2; merge index m1; by ind1 ind2; data m2; keep ind1 ind2 count; set m2; if count=. then count=0; proc transpose data=m2 out=mat prefix=c; var count; id ind2; by ind1; data mat; keep c1-c&dim; set mat; /*** IML ***/ proc iml; w=&w; use mat; read all into x; n=sum(x); x1=x#w; po=(1/n)*sum(x1); csum=x[+,]; rsum=x[,+]; KAPPA 0.4727891 y=rsum*csum; y_=csum*rsum; y=(1/n)*y; p=(1/n)*y; p_=(1/n)*(1/n)*y_; pcsum=p[+,]; prsum=p[,+]; wi=w#prsum; wj=w#pcsum; wqi=wj[,+]; wqj=wi[+,]; wij=j(&dim,&dim); do i=1 to &dim; do j=1 to &dim; wij[i,j]=wqi[i,]+wqj[,j]; end; end; wq=(w-wij)##2; pw=p#wq; pws=sum(pw); y1=y#w; pe=(1/n)*sum(y1); pke=sum(p_); kappa=(po-pe)/(1-pe); print kappa; var0=(1/(n*(1-pke)*(1- pke)))*(pws-pke*pke); std=sqrt(var0); cl_l=kappa-1.96*std; cl_u=kappa+1.96*std; print var0 std; print cl_l cl_u; quit;

VAR0 STD 0.0048129 0.0693751 CL_L CL_U 0.3368139 0.6087643 SAS, Base SAS, SAS/STAT, SAS/GRAPH and SAS/IML software are registered trademarks of SAS Institute Inc., Cary, NC, USA Address for correspondence: Dipl.Ing. Dr. Barbara Schneider Institut für Medizinische Statistik Universität Wien Schwarzspanierstr. 17 A - 1090 Wien Austria

REFERENCES: Bland J.M., Altman D. G. (1986) Statistical Methods for Assessing Agreement between two Methods of Clinical Measurement, THE LANCET, Feb. 8, 1986 Altman D. G. (1991) Practical Statistics for Medical Research, 1st edn. Chapman&Hall Fleiss J. L. et al. (1979), Inter-examiner Reliability in Caries Trials, J Dent Res 58(29:604-609, February 1979 Fleiss J. L. (1981) Statistical Methods for Rates and Proportions, 2nd edn. New York: Wiley Shrout P. E., Fleiss J. L. (1979) Intraclass Correlations: Uses in Assessing Rater Reliability,Psychological Bulletin 1979, Vol. 86, No. 2, 420-428 SAS Language: Reference, Version 6, First Edition SAS Procedures Guide, Version 6, Third Edition SAS/GRAPH Software: Reference, Version 6, First Edition, Volumes 1 and 2 SAS/IML Software: Usage and Reference, Version 6, First Edition SAS/STAT Software: Reference, Version 6, First Edition, Volumes 1 and 2