Two useful macros to nudge SAS to serve you

Similar documents
A SAS Macro for Balancing a Weighted Sample

A Macro for Systematic Treatment of Special Values in Weight of Evidence Variable Transformation Chaoxian Cai, Automated Financial Systems, Exton, PA

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG

Using Templates Created by the SAS/STAT Procedures

Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research

Oh Quartile, Where Art Thou? David Franklin, TheProgrammersCabin.com, Litchfield, NH

To conceptualize the process, the table below shows the highly correlated covariates in descending order of their R statistic.

Ranking Between the Lines

BY S NOTSORTED OPTION Karuna Samudral, Octagon Research Solutions, Inc., Wayne, PA Gregory M. Giddings, Centocor R&D Inc.

Using PROC REPORT to Cross-Tabulate Multiple Response Items Patrick Thornton, SRI International, Menlo Park, CA

A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

Identifying Duplicate Variables in a SAS Data Set

SUGI 29 Statistics and Data Analysis. To Rake or Not To Rake Is Not the Question Anymore with the Enhanced Raking Macro

Macro to compute best transform variable for the model

Let s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic

BACKGROUND INFORMATION ON COMPLEX SAMPLE SURVEYS

CREATING THE DISTRIBUTION ANALYSIS

Virtual Accessing of a SAS Data Set Using OPEN, FETCH, and CLOSE Functions with %SYSFUNC and %DO Loops

Stat 5100 Handout #14.a SAS: Logistic Regression

Data Quality Control: Using High Performance Binning to Prevent Information Loss

Data Quality Control for Big Data: Preventing Information Loss With High Performance Binning

Data Quality Control: Using High Performance Binning to Prevent Information Loss

Fathom Dynamic Data TM Version 2 Specifications

Creating Macro Calls using Proc Freq

A Cross-national Comparison Using Stacked Data

A Side of Hash for You To Dig Into

SAS/STAT 13.1 User s Guide. The NESTED Procedure

Macros for Two-Sample Hypothesis Tests Jinson J. Erinjeri, D.K. Shifflet and Associates Ltd., McLean, VA

SAS is the most widely installed analytical tool on mainframes. I don t know the situation for midrange and PCs. My Focus for SAS Tools Here

CHAPTER 7 Using Other SAS Software Products

A Practical and Efficient Approach in Generating AE (Adverse Events) Tables within a Clinical Study Environment

Assessing superiority/futility in a clinical trial: from multiplicity to simplicity with SAS

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by

Handling Numeric Representation SAS Errors Caused by Simple Floating-Point Arithmetic Computation Fuad J. Foty, U.S. Census Bureau, Washington, DC

Getting it Done with PROC TABULATE

Using SAS Macros to Extract P-values from PROC FREQ

An Application of PROC NLP to Survey Sample Weighting

Creating Code writing algorithms for producing n-lagged variables. Matt Bates, J.P. Morgan Chase, Columbus, OH

/* SAS Macro UNISTATS Version 2.2 December 2017

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

SAS/STAT 14.2 User s Guide. The SURVEYIMPUTE Procedure

Hidden in plain sight: my top ten underpublicized enhancements in SAS Versions 9.2 and 9.3

How to Keep Multiple Formats in One Variable after Transpose Mindy Wang

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data

Data Quality Review for Missing Values and Outliers

A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN

The NESTED Procedure (Chapter)

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95

Automating Preliminary Data Cleaning in SAS

Missing Pages Report. David Gray, PPD, Austin, TX Zhuo Chen, PPD, Austin, TX

SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC

/********************************************/ /* Evaluating the PS distribution!!! */ /********************************************/

Statistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment

Tales from the Help Desk 6: Solutions to Common SAS Tasks

SAS/STAT 15.1 User s Guide The STDIZE Procedure

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System

- 1 - Fig. A5.1 Missing value analysis dialog box

2 = Disagree 3 = Neutral 4 = Agree 5 = Strongly Agree. Disagree

Validation Summary using SYSINFO

footnote1 height=8pt j=l "(Rev. &sysdate)" j=c "{\b\ Page}{\field{\*\fldinst {\b\i PAGE}}}";

Tasks Menu Reference. Introduction. Data Management APPENDIX 1

SAS Example A10. Output Delivery System (ODS) Sample Data Set sales.txt. Examples of currently available ODS destinations: Mervyn Marasinghe

Analysis of Complex Survey Data with SAS

Submitting SAS Code On The Side

Macros to Report Missing Data: An HTML Data Collection Guide Patrick Thornton, University of California San Francisco, SF, California

Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.

Stat 5100 Handout #19 SAS: Influential Observations and Outliers

Contents of SAS Programming Techniques

TRANSFORMING MULTIPLE-RECORD DATA INTO SINGLE-RECORD FORMAT WHEN NUMBER OF VARIABLES IS LARGE.

Indenting with Style

STEP 1 - /*******************************/ /* Manipulate the data files */ /*******************************/ <<SAS DATA statements>>

SAS/STAT 14.3 User s Guide The SURVEYFREQ Procedure

Correcting for natural time lag bias in non-participants in pre-post intervention evaluation studies

PROC MEANS for Disaggregating Statistics in SAS : One Input Data Set and One Output Data Set with Everything You Need

Paper SDA-11. Logistic regression will be used for estimation of net error for the 2010 Census as outlined in Griffin (2005).

Robust Linear Regression (Passing- Bablok Median-Slope)

Essential ODS Techniques for Creating Reports in PDF Patrick Thornton, SRI International, Menlo Park, CA

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

An Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step Mike Zdeb, FSL, University at Albany School of Public Health, Rensselaer, NY

The G4GRID Procedure. Introduction APPENDIX 1

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

Right-click on whatever it is you are trying to change Get help about the screen you are on Help Help Get help interpreting a table

WEB MATERIAL. eappendix 1: SAS code for simulation

SAS Enterprise Miner : Tutorials and Examples

Choosing the Right Procedure

Generating Customized Analytical Reports from SAS Procedure Output Brinda Bhaskar and Kennan Murray, RTI International

T.I.P.S. (Techniques and Information for Programming in SAS )

Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex

TRANSFORMING MULTIPLE-RECORD DATA INTO SINGLE-RECORD FORMAT WHEN NUMBER OF VARIABLES IS LARGE.

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

PROC REPORT Basics: Getting Started with the Primary Statements

PharmaSUG China Paper 059

An Algorithm to Compute Exact Power of an Unordered RxC Contingency Table

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys

Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA

Base and Advance SAS

An introduction to SPSS

Transcription:

Two useful macros to nudge SAS to serve you David Izrael, Michael P. Battaglia, Abt Associates Inc., Cambridge, MA Abstract This paper offers two macros that augment the power of two SAS procedures: LOGISTIC and UNIVARIATE. PROC LOGISTIC calculates, among other statistics, several measures that reflect the predictive ability of a logistic regression model. Those are: percent concordant; discordant; and tied pairs, as well as four rank correlation indexes: Somers D; Gamma; Tau-a; and c. The procedure displays them in the Association of Predicted Probabilities and Observed Responses table. In the presence of survey weights, however, the procedure computes those measures ignoring the weights. This makes it difficult for survey researchers to use PROC LOGISTIC for assessment of the predictive ability of a model, because survey weights are commonly used to analyze survey data. The first macro we offer takes the survey weights into account when computing the mentioned Association Parameters and compares the unweighted measures with the ones calculated by the macro. PROC UNIVARIATE provides five methods for computing quantile statistics. However, these may not be enough if a researcher wants to match SAS statistical computation results with those from other statistical packages or use SAS to reproduce statistical computations done in another package. For instance, S-PLUS computes quantiles using a different approach. Our second macro computes quantiles following the algorithm used in S- PLUS and compares its results with respective quantiles produced by PROC UNIVARIATE. Macro I: to Compute the Weighted Association of Predicted Probabilities and Observed Responses Table. Introduction The Association of Predicted Probabilities and Observed Responses table lists several measures of association to help a researcher assess the quality of a logistic model. PROC LOGISTIC computes the percentage of concordant, discordant, and tied observations and the number of observation pairs upon which the percentages are based [1]. If a response variable is set to 1 in case of event and 0 in case of non-event, then for all pairs of observations with different values of the response variable, a pair is concordant if an event observation has a higher predicted probability than a non-event observation; a pair is discordant if an event observation has a lower predicted probability than a non-event observation; and if the predicted probabilities are equal for a pair, it is a tie [2]. PROC LOGISTIC computes percent concordant, discordant, and tied pairs along with the total number of pairs. The four rank correlation indexes in the table are computed from the numbers of concordant and discordant pairs of observations by the following formulae: where Somers D = (nc nd) / t (1) Gamma = (nc nd) / (nc + nd) (2) Tau-a = (nc nd) /.5N(N-1) (3) c = (nc +.5(t nc nd)) / t (4) N is the total number of observations in the input data set. t is the total number of pairs with different response values nc is the number of concordant pairs. nd is the number of discordant pairs [1]. In a relative sense, a model with higher values for these indexes has better predictive ability than a model with lower values for these indexes [2]. It turns out, however, that in the presence of survey weight the LOGISTIC procedure does not work as expected with regard to computation of Association Measures. To test this, we fitted data from an actual survey to the model with just two predictors in both unweighted and weighted cases. To obtain more detail than rounded results, we extracted the calculated measures using ODS: ods listing close; ods output Association=assocu; proc logistic descending data=analytic; class indep1 indep2 ; model response= indep1 indep2 ; 1

ods listing; proc print data=assocu noobs; title3 "Unweighted Association Measures"; ods listing close; ods output Association=assocw; proc logistic descending data=analytic; class indep1 indep2; model response= indep1 indep2; weight wgt/norm; ods listing; proc print data=assocw noobs; title3 "Weighted Association Measures"; The following output shows a complete identity of weighted and unweighted measures, which casts doubt upon the procedure s ability to correctly compute Association Parameters in the presence of survey weights. Unweighted Association Measures Label1 cvalue1 nvalue1 Label2 Value2 nvalue2 Percent Concordant 50.6 50.6048 Somers' D 0.178 0.1777 Percent Discordant 32.8 32.8271 Gamma 0.213 0.2130 Percent Tied 16.6 16.567 Tau-a 0.054 0.0544 Pairs 77623704 77623704 c 0.589 0.5889 Weighted Association Measures probability for the event observation (response is 1) be w i and p_hat i and for the non-event observation (response is 0) be w j and p_hat j. If p_hat i is greater than p_hat j, then, following the definition given in the introduction, the pair will be concordant and its weighted representation w i * w j will be added to the weighted total of concordant pairs. In the same vein, if p_hat i is lower than p_hat j, then the pair will be discordant and its weighted representation w i * w j will be added to the weighted total of discordant pairs. Finally, if the pair is neither concordant nor discordant, the product w i * w j will be added to the weighted total of tied pairs. Denoting W E as the total weighted number of event responses and W N as the total weighted number of non-event responses, the total weighted number of pairs is calculated as W E *W N. Based upon the weighted totals accumulated after E*N iterations, the macro calculates the respective percents and the correlation indexes by formulae (1) (4). This macro reports the correctly calculated weighted measures immediately after the official Association of Predicted Probabilities and Observed Responses table. Exhibit 1 demonstrates the beginning and the end of the listing of the macro that was run over the survey data set. The logistic model used in the example has 12 categorical independent variables expl1 expl12, dependent variable effect (1,0), weight wgt, and is called by the following statement: %wtappor ( ds = survey outds=, weight= wgt, model = expl1-expl12, depvar = effect ); Label1 cvalue1 nvalue1 Label2 Value2 nvalue2 Percent Concordant 50.6 50.6048 Somers' D 0.178 0.1777 Percent Discordant 32.8 32.8271 Gamma 0.213 0.2130 Percent Tied 16.6 16.567 Tau-a 0.054 0.0544 Pairs 77623704 77623704 c 0.589 0.5889 Although with an increase in the number of predictors in the model a certain difference between unweighted and weighted measures emerges, the official weighted measures are by no means what we could expect and use for model assessment. Macro WTAPPOR. As may be seen from Exhibit 1, there are measurable differences between the official measures and those calculated by the macro WTAPPOR. Note that the official weighted number of pairs 77131560 is a product of unweighted (E*N) frequencies of event and non-event responses 18106 and 4260 respectively, whereas the weighted number of pairs calculated and used by the macro 79632519 is a product of total normalized weights (W E *W N ) for event and non-event sets - 17922.953 and 4443.047 respectively. The macro itself is presented in Exhibit 2. It is well commented and easy to use. We offer here the macro WTAPPOR that does take survey weights into account. The macro uses the same formulae (1) (4) but in a weighted form. Let the number of event responses in a sample be E and the number of non-event responses be N. The total unweighted number of pairs being considered is E*N. Let us consider the ij-th pair of observations, and let the weight and the predicted 2

Exhibit 1. Association of Predicted Probabilities and Observed Responses The LOGISTIC Procedure Model Information Data Set WORK.ANALYTIC Response Variable effect Positive Effect Number of Response Levels 2 Number of Observations 22366 Weight Variable wgt Final Weight Sum of Weights 22366 Link Function Logit Optimization Technique Fisher's scoring Response Profile Ordered Total Total Value effect Frequency Weight 1 1 18106 17922.953 2 0 4260 4443.047 NOTE: Weights are normalized to the actual sample size................................................... Official table Association of Predicted Probabilities and Observed Responses Percent Concordant 64.8 Somers' D 0.302 Percent Discordant 34.6 Gamma 0.304 Percent Tied 0.6 Tau-a 0.093 Pairs 77131560 c 0.651 Table calculated by the Macro Association of Predicted Probabilities and Observed Responses using normalized weight WGT Weighted Percent Concordant 66.6 Weighted Somers' D 0.333 Weighted Percent Discordant 33.3 Weighted Gamma 0.333 Weighted Percent Tied 0.1 Weighted Tau-a 0.106 Weighted Pairs 79632519 Weighted c 0.666...................... Exhibit 2. Macro WTAPPOR %macro WTAPPOR (ds =, /* INPUT DATA SET */ outds =,/* OUTPUT DATA SET WITH MEASURES IF BLANK, JUST /*** FIT DATA BY LOGISTIC MODEL TO GET PREDICTED PROBABILITIES ***/ DISPLAYING RESULT */ weight =,/* SURVEY WEIGHT */ model =, /* STRING WITH EXPLANATORY VAR's. ALL MUST BE CATEGORICAL*/ depvar =, /* DEPENDENT VARIABLE, 1-EVENT, 0- NON-EVENT */ ) ; proc logistic descending data=&ds; weight &weight./norm; /*USING NORMALIZED WEIGHT*/ class &model; model &depvar= &model; output out=_probs(keep=&depvar &weight _p_hat) predicted=_p_hat proc sql noprint; /* TOTAL WEIGHTED NUMBER OF RECORDS*/ select sum(&weight) into: tot wgt from _probs; /* TOTAL UNWEIGHTED NUMBER OF RECORDS */ select count(*) into: tot unw from _probs; select count(*) into: tot nev from _probs where &depvar=0; quit; proc summary noprint nway; var &weight; output out=_out sum=_sumw0; /* TOTAL UNWEIGHTED NUMBER OF NON - EVENTS */ /* NORMALIZE WEIGHT */ data _probs1(rename=(_p_hat=_p_hat1 &weight=_w1)) /* EVENT DATA SET */ _probs0(rename=(_p_hat=_p_hat0 &weight=_w0)); /* NON EVENT DATA SET*/ set _probs; if _n_=1 then set _out; &weight= &weight.*&_tot_unw./_sumw0;/*normalization*/ _concord=0; _discord=0; _tie=0;***-> INITIALIZE MEASURES; if &depvar=1 then do; keep _p_hat &weight _concord _discord _tie ; output _probs1;end; else do; keep _p_hat &weight; output _probs0; end /* WEIGHTED TOTAL OF EVENTS */ 3

proc summary data=_probs1 noprint nway; var _w1; output out=_total1 sum=_total1; /* WEIGHTS TOTAL OF NON-EVENTS */ proc summary data=_probs0 noprint nway; var _w0; output out=_total0 sum=_total0; data _total; /* DATA SET WITH WEIGHTED TOTAL */ merge _total1 _total0; _total_p =_total1*_total0; %macro cummsr; %do i=1 %to & tot nev; /* COMPARE EACH EVENT OBSERVATION WITH EACH NON-EVENT OBSERVATION */ data _probs1; set _probs1; if _n_=1 then set _probs0(firstobs=&_i obs=&_i); if _p_hat1<_p_hat0 then /* ACCRUE DISCORD*/ _discord=_discord+_w0*_w1; else if _p_hat1>_p_hat0 then _concord=_concord+_w0*_w1; /*ACCRUE CONCORD */ else _tie=_tie+_w0*_w1; /* ACCRUE TIES */ drop _p_hat0 _w0; %mend cummsr; %cummsr; /* SUM ACCORDANCE, CONCORDANCE AND TIES*/ /* THROUGH THE WHOLE DATA SET */ proc summary data=_probs1 noprint nway; var _concord _discord _tie; output out=_out sum=_concord _discord _tie; /* CALCULATION OF PERCENTAGE AND MEASURES */ /* BY FORMULAE 1 4 */ data &outds _out(keep=wgt_:); merge _out _total; Wgt_Percent_Concordant = round(_concord*100/_total_p,.01); Wgt_Percent_Discordant = round(_discord*100/_total_p,.01); Wgt_Percent_Tied = round(_tie*100/_total_p,.01); Wgt_Pairs = _total_p; Wgt_Somers_D = (_concord - _discord) / _total_p; Wgt_Gamma = (_concord - _discord) / (_concord + _discord); Wgt_Tau_a = (_concord - _discord) /(.5*&_tot_unw.*(&_tot_unw - 1)); Wgt_c=(_concord +.5*(_total_p - _concord - _discord))/_total_p; /* DISPLAY RESULTS AFER OFICIAL TABLE */ data null ; set out; file print ls=80 ps=59; put Association of Predicted Probabilities and Observed Responses ; put using normalized weight &weight ; put; put Weighted Percent Concordant Wgt_Percent_Concordant 5.2 " Weighted Somers' D " Wgt_Somers_D 6.4; put Weighted Percent Discordant Wgt_Percent_Discordant 5.2 " Weighted Gamma " Wgt_Gamma 6.4 ; put Weighted Percent Tied Wgt_Percent_Tied 5.2 " Weighted Tau-a " Wgt_Tau_a 6.4 ; put Weighted Pairs Wgt_Pairs 10. " Weighted c " Wgt_c 6.4 ; %mend wtappor; Summary. The presented macro, WTAPPOR, is a valuable instrument for a survey researcher to assess the quality of a logistic model when survey weights are present. The macro gives appreciably different measures of association from those calculated by PROC LOGISTIC. Macro II: Are five methods to compute quantiles enough? If not, get a sixth one. Introduction The reader will remember that using PCTLDEF= option in PROC UNIVARIATE, one can specify one of five methods for computing quantile statistics. Following the definitions in [3], let n be the number of nonmissing values for a variable and let x 1,,x n represent the ordered values 4

of the variable. For the tth percentile, let p=t/100. For definitions 1, 2, 3, and 5 below, let np = j + g, where j is the integer part and g is the fractional part of np. For definition 4, let (n+1)p = j + g. Then, the tth percentile, y, is defined as follows: PCTLDEF = 1 weighted average at x np y = (1 g) x j + gx j+1, where x 0 is taken to be x 1 PCTLDEF = 2 observation numbered closest to np y = x i, where i is the integer part of np + ½ if g ½. If g = ½, then y= x j if j is even, or y = x j+1 if j is odd. PCTLDEF = 3 empirical distribution function, y = x j if g = 0, y= x j+1 if g > 0 PCTLDEF = 4 weighted average aimed at x p(n+1), y = (1 g)x j + gx j+1, where x n+1 is taken to be x n PCTLDEF = 5 empirical distribution function with averaging, y = (x j + x j+1) /2 if g = 0, y = x j+1 if g>0. Researchers often need to match results obtained by SAS with those given by another statistical package or to reproduce with SAS statistical computations done in another package. If quantiles are involved in those statistical computations, matching may fail because another statistical package may compute quantiles differently. For example, S-PLUS uses the function quantile(x, p) that computes quantiles at specified probabilities linearly interpolating and using formula: quantile(x, p) = [1-(p(n 1) - p(n-1) )]x 1+ p(n-1) + [p(n-1) - p(n-1) ]x 2+ p(n-1) (5) where x 1,,x n is the ordered sample, p is specified probability, denotes the floor or integer part of [4]. The result of the function quantile(x, p) will not be generally identical to any of the five methods described above. Below, we present the macro QUANT6SP that computes S- PLUS-like quantiles by formula (5) and compare its results with those obtained by the five methods of PROC UNIVARIATE. Macro QUANT6SP The macro presented below is richly supplied with comments and is easy to use. %macro quant6sp ( inds=, /* input data set with variable of interest */ var =, /* variable upon which to compute quantiles */ ncell=, /* number of cells boundaries of which are to be determined by quantiles,4 - for quartiles */ prfx=, /* prefix we want for quantiles variables */ outds=, /*data set with quantiles */ ); %let _step = %sysevalf(1/&ncell); /* bounders of quantiles */ data _temp; %macro stq; /* create string with boundaries of quantiles */ f= "0 " %do i=1 %to %eval(&ncell-1); %sysevalf(&i*&_step) ' ' " 1"; %mend; %stq; data _null_; set _temp; call symput('pctl',left(f)); /* create macro variable as string with boundaries of quantiles */ %put BOUNDARIES OF QUANTILES: &pctl; proc sort data=&inds (keep=&var) out=_i; by &var; /* order variable in ascending*/ data _null_; /* number of records,that is values of variable */ set _i end=fin; retain _n; _n+1; if fin then call symput('totn',left(_n)); %do l=1 %to %eval(&ncell+1); %let p&l =%scan(&pctl,&l, %str( )); /* retrieve boundary and put it into respective macro var*/ data &outds (keep= &prfx.:) ; set _i end=_fin; retain %do j=1 %to %eval(&ncell+1); _less&j _greater&j 0; /* retrieve values of variable for formula (1) components */ %do j=1 %to %eval(&ncell+1); /* accumulate components of formula (1) for all quantiles */ 5

if _n_ = 1+floor(%sysevalf(&&p&j*(&totn-1))) then _less&j=&var; if _n_ = 2+floor(%sysevalf(&&p&j*(&totn-1))) then _greater&j=&var; if _fin then do; /* compute formula (1) for all boundaries /* using one passage through data set */ %do j=1 %to %eval(&ncell+1); &prfx&j=(1-(%sysevalf(&&p&j*(&totn-1)) - floor(%sysevalf(&&p&j*(&totn-1)))))*_less&j + (%sysevalf(&&p&j*(&totn-1)) - floor(%sysevalf(&&p&j*(&totn- 1))))*_greater&j; output; end; proc print; %mend; Results Here, we present an example of macro QUANT6SP call to break down predicted probabilities by quartiles: %quant6sp (inds = probs, var = probabs, ncell=4, outds=out, prfx =method6); The computed quartiles are shown below: 0% 25% 50% 75% 100% 0.018626 0.58884 0.65698 0.71963 0.79855 Applying each of the five methods of PROC UNIVARIATE to the same variable probabs, we obtain the following table: PCTLDEF 0% 25% 50% 75% 100% 1 0.018626 0.58633 0.65676 0.71951 0.79855 2 0.018626 0.58633 0.65676 0.71951 0.79855 3 0.018626 0.58633 0.65676 0.71951 0.79855 4 0.018626 0.58717 0.65698 0.71987 0.79855 5 0.018626 0.58800 0.65698 0.71975 0.79855 As is shown, none of the five sets of quartiles above is identical to the results obtained using the macro quant6sp References 1.SAS Institute, Inc (1999). SAS/STAT.Version 8. Chapter 39, Cary, NC: SAS institute Inc. 2. Logistic Regression Examples. Using the SAS System. SAS Institute Inc.,1995 3. SAS Institute, Inc (1999). SAS/BASE. SAS Procedures Guide. PROC UNIVARIATE. 4. Venables, W.N, Ripley B.D (2000) Modern Applied Statistics with S-PLUS, Springer-Verlag, New York Contact Information David Izrael Abt Associates Inc. Cambridge, MA 02338 tel: (617) 349-2434 e-mail: david_izrael@abtassoc.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies 6