Two useful macros to nudge SAS to serve you
|
|
- Bernadette Ford
- 6 years ago
- Views:
Transcription
1 Two useful macros to nudge SAS to serve you David Izrael, Michael P. Battaglia, Abt Associates Inc., Cambridge, MA Abstract This paper offers two macros that augment the power of two SAS procedures: LOGISTIC and UNIVARIATE. PROC LOGISTIC calculates, among other statistics, several measures that reflect the predictive ability of a logistic regression model. Those are: percent concordant; discordant; and tied pairs, as well as four rank correlation indexes: Somers D; Gamma; Tau-a; and c. The procedure displays them in the Association of Predicted Probabilities and Observed Responses table. In the presence of survey weights, however, the procedure computes those measures ignoring the weights. This makes it difficult for survey researchers to use PROC LOGISTIC for assessment of the predictive ability of a model, because survey weights are commonly used to analyze survey data. The first macro we offer takes the survey weights into account when computing the mentioned Association Parameters and compares the unweighted measures with the ones calculated by the macro. PROC UNIVARIATE provides five methods for computing quantile statistics. However, these may not be enough if a researcher wants to match SAS statistical computation results with those from other statistical packages or use SAS to reproduce statistical computations done in another package. For instance, S-PLUS computes quantiles using a different approach. Our second macro computes quantiles following the algorithm used in S- PLUS and compares its results with respective quantiles produced by PROC UNIVARIATE. Macro I: to Compute the Weighted Association of Predicted Probabilities and Observed Responses Table. Introduction The Association of Predicted Probabilities and Observed Responses table lists several measures of association to help a researcher assess the quality of a logistic model. PROC LOGISTIC computes the percentage of concordant, discordant, and tied observations and the number of observation pairs upon which the percentages are based [1]. If a response variable is set to 1 in case of event and 0 in case of non-event, then for all pairs of observations with different values of the response variable, a pair is concordant if an event observation has a higher predicted probability than a non-event observation; a pair is discordant if an event observation has a lower predicted probability than a non-event observation; and if the predicted probabilities are equal for a pair, it is a tie [2]. PROC LOGISTIC computes percent concordant, discordant, and tied pairs along with the total number of pairs. The four rank correlation indexes in the table are computed from the numbers of concordant and discordant pairs of observations by the following formulae: where Somers D = (nc nd) / t (1) Gamma = (nc nd) / (nc + nd) (2) Tau-a = (nc nd) /.5N(N-1) (3) c = (nc +.5(t nc nd)) / t (4) N is the total number of observations in the input data set. t is the total number of pairs with different response values nc is the number of concordant pairs. nd is the number of discordant pairs [1]. In a relative sense, a model with higher values for these indexes has better predictive ability than a model with lower values for these indexes [2]. It turns out, however, that in the presence of survey weight the LOGISTIC procedure does not work as expected with regard to computation of Association Measures. To test this, we fitted data from an actual survey to the model with just two predictors in both unweighted and weighted cases. To obtain more detail than rounded results, we extracted the calculated measures using ODS: ods listing close; ods output Association=assocu; proc logistic descending data=analytic; class indep1 indep2 ; model response= indep1 indep2 ; 1
2 ods listing; proc print data=assocu noobs; title3 "Unweighted Association Measures"; ods listing close; ods output Association=assocw; proc logistic descending data=analytic; class indep1 indep2; model response= indep1 indep2; weight wgt/norm; ods listing; proc print data=assocw noobs; title3 "Weighted Association Measures"; The following output shows a complete identity of weighted and unweighted measures, which casts doubt upon the procedure s ability to correctly compute Association Parameters in the presence of survey weights. Unweighted Association Measures Label1 cvalue1 nvalue1 Label2 Value2 nvalue2 Percent Concordant Somers' D Percent Discordant Gamma Percent Tied Tau-a Pairs c Weighted Association Measures probability for the event observation (response is 1) be w i and p_hat i and for the non-event observation (response is 0) be w j and p_hat j. If p_hat i is greater than p_hat j, then, following the definition given in the introduction, the pair will be concordant and its weighted representation w i * w j will be added to the weighted total of concordant pairs. In the same vein, if p_hat i is lower than p_hat j, then the pair will be discordant and its weighted representation w i * w j will be added to the weighted total of discordant pairs. Finally, if the pair is neither concordant nor discordant, the product w i * w j will be added to the weighted total of tied pairs. Denoting W E as the total weighted number of event responses and W N as the total weighted number of non-event responses, the total weighted number of pairs is calculated as W E *W N. Based upon the weighted totals accumulated after E*N iterations, the macro calculates the respective percents and the correlation indexes by formulae (1) (4). This macro reports the correctly calculated weighted measures immediately after the official Association of Predicted Probabilities and Observed Responses table. Exhibit 1 demonstrates the beginning and the end of the listing of the macro that was run over the survey data set. The logistic model used in the example has 12 categorical independent variables expl1 expl12, dependent variable effect (1,0), weight wgt, and is called by the following statement: %wtappor ( ds = survey outds=, weight= wgt, model = expl1-expl12, depvar = effect ); Label1 cvalue1 nvalue1 Label2 Value2 nvalue2 Percent Concordant Somers' D Percent Discordant Gamma Percent Tied Tau-a Pairs c Although with an increase in the number of predictors in the model a certain difference between unweighted and weighted measures emerges, the official weighted measures are by no means what we could expect and use for model assessment. Macro WTAPPOR. As may be seen from Exhibit 1, there are measurable differences between the official measures and those calculated by the macro WTAPPOR. Note that the official weighted number of pairs is a product of unweighted (E*N) frequencies of event and non-event responses and 4260 respectively, whereas the weighted number of pairs calculated and used by the macro is a product of total normalized weights (W E *W N ) for event and non-event sets and respectively. The macro itself is presented in Exhibit 2. It is well commented and easy to use. We offer here the macro WTAPPOR that does take survey weights into account. The macro uses the same formulae (1) (4) but in a weighted form. Let the number of event responses in a sample be E and the number of non-event responses be N. The total unweighted number of pairs being considered is E*N. Let us consider the ij-th pair of observations, and let the weight and the predicted 2
3 Exhibit 1. Association of Predicted Probabilities and Observed Responses The LOGISTIC Procedure Model Information Data Set WORK.ANALYTIC Response Variable effect Positive Effect Number of Response Levels 2 Number of Observations Weight Variable wgt Final Weight Sum of Weights Link Function Logit Optimization Technique Fisher's scoring Response Profile Ordered Total Total Value effect Frequency Weight NOTE: Weights are normalized to the actual sample size Official table Association of Predicted Probabilities and Observed Responses Percent Concordant 64.8 Somers' D Percent Discordant 34.6 Gamma Percent Tied 0.6 Tau-a Pairs c Table calculated by the Macro Association of Predicted Probabilities and Observed Responses using normalized weight WGT Weighted Percent Concordant 66.6 Weighted Somers' D Weighted Percent Discordant 33.3 Weighted Gamma Weighted Percent Tied 0.1 Weighted Tau-a Weighted Pairs Weighted c Exhibit 2. Macro WTAPPOR %macro WTAPPOR (ds =, /* INPUT DATA SET */ outds =,/* OUTPUT DATA SET WITH MEASURES IF BLANK, JUST /*** FIT DATA BY LOGISTIC MODEL TO GET PREDICTED PROBABILITIES ***/ DISPLAYING RESULT */ weight =,/* SURVEY WEIGHT */ model =, /* STRING WITH EXPLANATORY VAR's. ALL MUST BE CATEGORICAL*/ depvar =, /* DEPENDENT VARIABLE, 1-EVENT, 0- NON-EVENT */ ) ; proc logistic descending data=&ds; weight &weight./norm; /*USING NORMALIZED WEIGHT*/ class &model; model &depvar= &model; output out=_probs(keep=&depvar &weight _p_hat) predicted=_p_hat proc sql noprint; /* TOTAL WEIGHTED NUMBER OF RECORDS*/ select sum(&weight) into: tot wgt from _probs; /* TOTAL UNWEIGHTED NUMBER OF RECORDS */ select count(*) into: tot unw from _probs; select count(*) into: tot nev from _probs where &depvar=0; quit; proc summary noprint nway; var &weight; output out=_out sum=_sumw0; /* TOTAL UNWEIGHTED NUMBER OF NON - EVENTS */ /* NORMALIZE WEIGHT */ data _probs1(rename=(_p_hat=_p_hat1 &weight=_w1)) /* EVENT DATA SET */ _probs0(rename=(_p_hat=_p_hat0 &weight=_w0)); /* NON EVENT DATA SET*/ set _probs; if _n_=1 then set _out; &weight= &weight.*&_tot_unw./_sumw0;/*normalization*/ _concord=0; _discord=0; _tie=0;***-> INITIALIZE MEASURES; if &depvar=1 then do; keep _p_hat &weight _concord _discord _tie ; output _probs1;end; else do; keep _p_hat &weight; output _probs0; end /* WEIGHTED TOTAL OF EVENTS */ 3
4 proc summary data=_probs1 noprint nway; var _w1; output out=_total1 sum=_total1; /* WEIGHTS TOTAL OF NON-EVENTS */ proc summary data=_probs0 noprint nway; var _w0; output out=_total0 sum=_total0; data _total; /* DATA SET WITH WEIGHTED TOTAL */ merge _total1 _total0; _total_p =_total1*_total0; %macro cummsr; %do i=1 %to & tot nev; /* COMPARE EACH EVENT OBSERVATION WITH EACH NON-EVENT OBSERVATION */ data _probs1; set _probs1; if _n_=1 then set _probs0(firstobs=&_i obs=&_i); if _p_hat1<_p_hat0 then /* ACCRUE DISCORD*/ _discord=_discord+_w0*_w1; else if _p_hat1>_p_hat0 then _concord=_concord+_w0*_w1; /*ACCRUE CONCORD */ else _tie=_tie+_w0*_w1; /* ACCRUE TIES */ drop _p_hat0 _w0; %mend cummsr; %cummsr; /* SUM ACCORDANCE, CONCORDANCE AND TIES*/ /* THROUGH THE WHOLE DATA SET */ proc summary data=_probs1 noprint nway; var _concord _discord _tie; output out=_out sum=_concord _discord _tie; /* CALCULATION OF PERCENTAGE AND MEASURES */ /* BY FORMULAE 1 4 */ data &outds _out(keep=wgt_:); merge _out _total; Wgt_Percent_Concordant = round(_concord*100/_total_p,.01); Wgt_Percent_Discordant = round(_discord*100/_total_p,.01); Wgt_Percent_Tied = round(_tie*100/_total_p,.01); Wgt_Pairs = _total_p; Wgt_Somers_D = (_concord - _discord) / _total_p; Wgt_Gamma = (_concord - _discord) / (_concord + _discord); Wgt_Tau_a = (_concord - _discord) /(.5*&_tot_unw.*(&_tot_unw - 1)); Wgt_c=(_concord +.5*(_total_p - _concord - _discord))/_total_p; /* DISPLAY RESULTS AFER OFICIAL TABLE */ data null ; set out; file print ls=80 ps=59; put Association of Predicted Probabilities and Observed Responses ; put using normalized weight &weight ; put; put Weighted Percent Concordant Wgt_Percent_Concordant 5.2 " Weighted Somers' D " Wgt_Somers_D 6.4; put Weighted Percent Discordant Wgt_Percent_Discordant 5.2 " Weighted Gamma " Wgt_Gamma 6.4 ; put Weighted Percent Tied Wgt_Percent_Tied 5.2 " Weighted Tau-a " Wgt_Tau_a 6.4 ; put Weighted Pairs Wgt_Pairs 10. " Weighted c " Wgt_c 6.4 ; %mend wtappor; Summary. The presented macro, WTAPPOR, is a valuable instrument for a survey researcher to assess the quality of a logistic model when survey weights are present. The macro gives appreciably different measures of association from those calculated by PROC LOGISTIC. Macro II: Are five methods to compute quantiles enough? If not, get a sixth one. Introduction The reader will remember that using PCTLDEF= option in PROC UNIVARIATE, one can specify one of five methods for computing quantile statistics. Following the definitions in [3], let n be the number of nonmissing values for a variable and let x 1,,x n represent the ordered values 4
5 of the variable. For the tth percentile, let p=t/100. For definitions 1, 2, 3, and 5 below, let np = j + g, where j is the integer part and g is the fractional part of np. For definition 4, let (n+1)p = j + g. Then, the tth percentile, y, is defined as follows: PCTLDEF = 1 weighted average at x np y = (1 g) x j + gx j+1, where x 0 is taken to be x 1 PCTLDEF = 2 observation numbered closest to np y = x i, where i is the integer part of np + ½ if g ½. If g = ½, then y= x j if j is even, or y = x j+1 if j is odd. PCTLDEF = 3 empirical distribution function, y = x j if g = 0, y= x j+1 if g > 0 PCTLDEF = 4 weighted average aimed at x p(n+1), y = (1 g)x j + gx j+1, where x n+1 is taken to be x n PCTLDEF = 5 empirical distribution function with averaging, y = (x j + x j+1) /2 if g = 0, y = x j+1 if g>0. Researchers often need to match results obtained by SAS with those given by another statistical package or to reproduce with SAS statistical computations done in another package. If quantiles are involved in those statistical computations, matching may fail because another statistical package may compute quantiles differently. For example, S-PLUS uses the function quantile(x, p) that computes quantiles at specified probabilities linearly interpolating and using formula: quantile(x, p) = [1-(p(n 1) - p(n-1) )]x 1+ p(n-1) + [p(n-1) - p(n-1) ]x 2+ p(n-1) (5) where x 1,,x n is the ordered sample, p is specified probability, denotes the floor or integer part of [4]. The result of the function quantile(x, p) will not be generally identical to any of the five methods described above. Below, we present the macro QUANT6SP that computes S- PLUS-like quantiles by formula (5) and compare its results with those obtained by the five methods of PROC UNIVARIATE. Macro QUANT6SP The macro presented below is richly supplied with comments and is easy to use. %macro quant6sp ( inds=, /* input data set with variable of interest */ var =, /* variable upon which to compute quantiles */ ncell=, /* number of cells boundaries of which are to be determined by quantiles,4 - for quartiles */ prfx=, /* prefix we want for quantiles variables */ outds=, /*data set with quantiles */ ); %let _step = %sysevalf(1/&ncell); /* bounders of quantiles */ data _temp; %macro stq; /* create string with boundaries of quantiles */ f= "0 " %do i=1 %to %eval(&ncell-1); %sysevalf(&i*&_step) ' ' " 1"; %mend; %stq; data _null_; set _temp; call symput('pctl',left(f)); /* create macro variable as string with boundaries of quantiles */ %put BOUNDARIES OF QUANTILES: &pctl; proc sort data=&inds (keep=&var) out=_i; by &var; /* order variable in ascending*/ data _null_; /* number of records,that is values of variable */ set _i end=fin; retain _n; _n+1; if fin then call symput('totn',left(_n)); %do l=1 %to %eval(&ncell+1); %let p&l =%scan(&pctl,&l, %str( )); /* retrieve boundary and put it into respective macro var*/ data &outds (keep= &prfx.:) ; set _i end=_fin; retain %do j=1 %to %eval(&ncell+1); _less&j _greater&j 0; /* retrieve values of variable for formula (1) components */ %do j=1 %to %eval(&ncell+1); /* accumulate components of formula (1) for all quantiles */ 5
6 if _n_ = 1+floor(%sysevalf(&&p&j*(&totn-1))) then _less&j=&var; if _n_ = 2+floor(%sysevalf(&&p&j*(&totn-1))) then _greater&j=&var; if _fin then do; /* compute formula (1) for all boundaries /* using one passage through data set */ %do j=1 %to %eval(&ncell+1); &prfx&j=(1-(%sysevalf(&&p&j*(&totn-1)) - floor(%sysevalf(&&p&j*(&totn-1)))))*_less&j + (%sysevalf(&&p&j*(&totn-1)) - floor(%sysevalf(&&p&j*(&totn- 1))))*_greater&j; output; end; proc print; %mend; Results Here, we present an example of macro QUANT6SP call to break down predicted probabilities by quartiles: %quant6sp (inds = probs, var = probabs, ncell=4, outds=out, prfx =method6); The computed quartiles are shown below: 0% 25% 50% 75% 100% Applying each of the five methods of PROC UNIVARIATE to the same variable probabs, we obtain the following table: PCTLDEF 0% 25% 50% 75% 100% As is shown, none of the five sets of quartiles above is identical to the results obtained using the macro quant6sp References 1.SAS Institute, Inc (1999). SAS/STAT.Version 8. Chapter 39, Cary, NC: SAS institute Inc. 2. Logistic Regression Examples. Using the SAS System. SAS Institute Inc., SAS Institute, Inc (1999). SAS/BASE. SAS Procedures Guide. PROC UNIVARIATE. 4. Venables, W.N, Ripley B.D (2000) Modern Applied Statistics with S-PLUS, Springer-Verlag, New York Contact Information David Izrael Abt Associates Inc. Cambridge, MA tel: (617) david_izrael@abtassoc.com SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies 6
A SAS Macro for Balancing a Weighted Sample
Paper 258-25 A SAS Macro for Balancing a Weighted Sample David Izrael, David C. Hoaglin, and Michael P. Battaglia Abt Associates Inc., Cambridge, Massachusetts Abstract It is often desirable to adjust
More informationA Macro for Systematic Treatment of Special Values in Weight of Evidence Variable Transformation Chaoxian Cai, Automated Financial Systems, Exton, PA
Paper RF10-2015 A Macro for Systematic Treatment of Special Values in Weight of Evidence Variable Transformation Chaoxian Cai, Automated Financial Systems, Exton, PA ABSTRACT Weight of evidence (WOE) recoding
More informationSD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG
Paper SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG Qixuan Chen, University of Michigan, Ann Arbor, MI Brenda Gillespie, University of Michigan, Ann Arbor, MI ABSTRACT This paper
More informationUsing Templates Created by the SAS/STAT Procedures
Paper 081-29 Using Templates Created by the SAS/STAT Procedures Yanhong Huang, Ph.D. UMDNJ, Newark, NJ Jianming He, Solucient, LLC., Berkeley Heights, NJ ABSTRACT SAS procedures provide a large quantity
More informationWant to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research
Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research Liping Huang, Center for Home Care Policy and Research, Visiting Nurse Service of New York, NY, NY ABSTRACT The
More informationOh Quartile, Where Art Thou? David Franklin, TheProgrammersCabin.com, Litchfield, NH
PharmaSUG 2013 Paper SP08 Oh Quartile, Where Art Thou? David Franklin, TheProgrammersCabin.com, Litchfield, NH ABSTRACT "Why is my first quartile number different from yours?" It was this question that
More informationTo conceptualize the process, the table below shows the highly correlated covariates in descending order of their R statistic.
Automating the process of choosing among highly correlated covariates for multivariable logistic regression Michael C. Doherty, i3drugsafety, Waltham, MA ABSTRACT In observational studies, there can be
More informationRanking Between the Lines
Ranking Between the Lines A %MACRO for Interpolated Medians By Joe Lorenz SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in
More informationBY S NOTSORTED OPTION Karuna Samudral, Octagon Research Solutions, Inc., Wayne, PA Gregory M. Giddings, Centocor R&D Inc.
ABSTRACT BY S NOTSORTED OPTION Karuna Samudral, Octagon Research Solutions, Inc., Wayne, PA Gregory M. Giddings, Centocor R&D Inc., Malvern, PA What if the usual sort and usual group processing would eliminate
More informationUsing PROC REPORT to Cross-Tabulate Multiple Response Items Patrick Thornton, SRI International, Menlo Park, CA
Using PROC REPORT to Cross-Tabulate Multiple Response Items Patrick Thornton, SRI International, Menlo Park, CA ABSTRACT This paper describes for an intermediate SAS user the use of PROC REPORT to create
More informationA SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes
A SAS Macro for Producing Benchmarks for Interpreting School Effect Sizes Brian E. Lawton Curriculum Research & Development Group University of Hawaii at Manoa Honolulu, HI December 2012 Copyright 2012
More informationPharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA
PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA ABSTRACT Labeling of the X-axis usually involves a tedious axis statement specifying
More informationFrequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS
ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion
More informationIdentifying Duplicate Variables in a SAS Data Set
Paper 1654-2018 Identifying Duplicate Variables in a SAS Data Set Bruce Gilsen, Federal Reserve Board, Washington, DC ABSTRACT In the big data era, removing duplicate data from a data set can reduce disk
More informationSUGI 29 Statistics and Data Analysis. To Rake or Not To Rake Is Not the Question Anymore with the Enhanced Raking Macro
Paper 7-9 To Rake or Not To Rake Is Not the Question Anymore with the Enhanced Raking Macro David Izrael, David C. Hoaglin, and Michael P. Battaglia Abt Associates Inc., Cambridge, Massachusetts Abstract
More informationMacro to compute best transform variable for the model
Paper 3103-2015 Macro to compute best transform variable for the model Nancy Hu, Discover Financial Service ABSTRACT This study is intended to assist Analysts to generate the best of variables using simple
More informationLet s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic
PharmaSUG 2018 - Paper EP-09 Let s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic Richann Watson, DataRich Consulting, Batavia, OH Lynn Mullins, PPD, Cincinnati,
More informationBACKGROUND INFORMATION ON COMPLEX SAMPLE SURVEYS
Analysis of Complex Sample Survey Data Using the SURVEY PROCEDURES and Macro Coding Patricia A. Berglund, Institute For Social Research-University of Michigan, Ann Arbor, Michigan ABSTRACT The paper presents
More informationCREATING THE DISTRIBUTION ANALYSIS
Chapter 12 Examining Distributions Chapter Table of Contents CREATING THE DISTRIBUTION ANALYSIS...176 BoxPlot...178 Histogram...180 Moments and Quantiles Tables...... 183 ADDING DENSITY ESTIMATES...184
More informationVirtual Accessing of a SAS Data Set Using OPEN, FETCH, and CLOSE Functions with %SYSFUNC and %DO Loops
Paper 8140-2016 Virtual Accessing of a SAS Data Set Using OPEN, FETCH, and CLOSE Functions with %SYSFUNC and %DO Loops Amarnath Vijayarangan, Emmes Services Pvt Ltd, India ABSTRACT One of the truths about
More informationStat 5100 Handout #14.a SAS: Logistic Regression
Stat 5100 Handout #14.a SAS: Logistic Regression Example: (Text Table 14.3) Individuals were randomly sampled within two sectors of a city, and checked for presence of disease (here, spread by mosquitoes).
More informationData Quality Control: Using High Performance Binning to Prevent Information Loss
SESUG Paper DM-173-2017 Data Quality Control: Using High Performance Binning to Prevent Information Loss ABSTRACT Deanna N Schreiber-Gregory, Henry M Jackson Foundation It is a well-known fact that the
More informationData Quality Control for Big Data: Preventing Information Loss With High Performance Binning
Data Quality Control for Big Data: Preventing Information Loss With High Performance Binning ABSTRACT Deanna Naomi Schreiber-Gregory, Henry M Jackson Foundation, Bethesda, MD It is a well-known fact that
More informationData Quality Control: Using High Performance Binning to Prevent Information Loss
Paper 2821-2018 Data Quality Control: Using High Performance Binning to Prevent Information Loss Deanna Naomi Schreiber-Gregory, Henry M Jackson Foundation ABSTRACT It is a well-known fact that the structure
More informationFathom Dynamic Data TM Version 2 Specifications
Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other
More informationCreating Macro Calls using Proc Freq
Creating Macro Calls using Proc Freq, Educational Testing Service, Princeton, NJ ABSTRACT Imagine you were asked to get a series of statistics/tables for each country in the world. You have the data, but
More informationA Cross-national Comparison Using Stacked Data
A Cross-national Comparison Using Stacked Data Goal In this exercise, we combine household- and person-level files across countries to run a regression estimating the usual hours of the working-aged civilian
More informationA Side of Hash for You To Dig Into
A Side of Hash for You To Dig Into Shan Ali Rasul, Indigo Books & Music Inc, Toronto, Ontario, Canada. ABSTRACT Within the realm of Customer Relationship Management (CRM) there is always a need for segmenting
More informationSAS/STAT 13.1 User s Guide. The NESTED Procedure
SAS/STAT 13.1 User s Guide The NESTED Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute
More informationMacros for Two-Sample Hypothesis Tests Jinson J. Erinjeri, D.K. Shifflet and Associates Ltd., McLean, VA
Paper CC-20 Macros for Two-Sample Hypothesis Tests Jinson J. Erinjeri, D.K. Shifflet and Associates Ltd., McLean, VA ABSTRACT Statistical Hypothesis Testing is performed to determine whether enough statistical
More informationSAS is the most widely installed analytical tool on mainframes. I don t know the situation for midrange and PCs. My Focus for SAS Tools Here
Explore, Analyze, and Summarize Your Data with SAS Software: Selecting the Best Power Tool from a Rich Portfolio PhD SAS is the most widely installed analytical tool on mainframes. I don t know the situation
More informationCHAPTER 7 Using Other SAS Software Products
77 CHAPTER 7 Using Other SAS Software Products Introduction 77 Using SAS DATA Step Features in SCL 78 Statements 78 Functions 79 Variables 79 Numeric Variables 79 Character Variables 79 Expressions 80
More informationA Practical and Efficient Approach in Generating AE (Adverse Events) Tables within a Clinical Study Environment
A Practical and Efficient Approach in Generating AE (Adverse Events) Tables within a Clinical Study Environment Abstract Jiannan Hu Vertex Pharmaceuticals, Inc. When a clinical trial is at the stage of
More informationAssessing superiority/futility in a clinical trial: from multiplicity to simplicity with SAS
PharmaSUG2010 Paper SP10 Assessing superiority/futility in a clinical trial: from multiplicity to simplicity with SAS Phil d Almada, Duke Clinical Research Institute (DCRI), Durham, NC Laura Aberle, Duke
More informationPaper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by
Paper CC-016 A macro for nearest neighbor Lung-Chang Chien, University of North Carolina at Chapel Hill, Chapel Hill, NC Mark Weaver, Family Health International, Research Triangle Park, NC ABSTRACT SAS
More informationHandling Numeric Representation SAS Errors Caused by Simple Floating-Point Arithmetic Computation Fuad J. Foty, U.S. Census Bureau, Washington, DC
Paper BB-206 Handling Numeric Representation SAS Errors Caused by Simple Floating-Point Arithmetic Computation Fuad J. Foty, U.S. Census Bureau, Washington, DC ABSTRACT Every SAS programmer knows that
More informationGetting it Done with PROC TABULATE
ABSTRACT Getting it Done with PROC TABULATE Michael J. Williams, ICON Clinical Research, San Francisco, CA The task of displaying statistical summaries of different types of variables in a single table
More informationUsing SAS Macros to Extract P-values from PROC FREQ
SESUG 2016 ABSTRACT Paper CC-232 Using SAS Macros to Extract P-values from PROC FREQ Rachel Straney, University of Central Florida This paper shows how to leverage the SAS Macro Facility with PROC FREQ
More informationAn Application of PROC NLP to Survey Sample Weighting
An Application of PROC NLP to Survey Sample Weighting Talbot Michael Katz, Analytic Data Information Technologies, New York, NY ABSTRACT The classic weighting formula for survey respondents compensates
More informationCreating Code writing algorithms for producing n-lagged variables. Matt Bates, J.P. Morgan Chase, Columbus, OH
Paper AA05-2014 Creating Code writing algorithms for producing n-lagged variables Matt Bates, J.P. Morgan Chase, Columbus, OH ABSTRACT As a predictive modeler with time-series data there is a continuous
More information/* SAS Macro UNISTATS Version 2.2 December 2017
/*-------------------------------------------------------------------- SAS Macro UNISTATS Version 2.2 December 2017 UNISTATS makes PROC UNIVARIATE statistics more convenient by presenting one row for each
More information%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma
Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma ABSTRACT Today there is more pressure on programmers to deliver summary outputs faster without sacrificing quality. By using just a few programming
More informationSAS/STAT 14.2 User s Guide. The SURVEYIMPUTE Procedure
SAS/STAT 14.2 User s Guide The SURVEYIMPUTE Procedure This document is an individual chapter from SAS/STAT 14.2 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute
More informationHidden in plain sight: my top ten underpublicized enhancements in SAS Versions 9.2 and 9.3
Hidden in plain sight: my top ten underpublicized enhancements in SAS Versions 9.2 and 9.3 Bruce Gilsen, Federal Reserve Board, Washington, DC ABSTRACT SAS Versions 9.2 and 9.3 contain many interesting
More informationHow to Keep Multiple Formats in One Variable after Transpose Mindy Wang
How to Keep Multiple Formats in One Variable after Transpose Mindy Wang Abstract In clinical trials and many other research fields, proc transpose are used very often. When many variables with their individual
More informationSo Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines
Paper TT13 So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines Anthony Harris, PPD, Wilmington, NC Robby Diseker, PPD, Wilmington, NC ABSTRACT
More informationThe Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data
Paper PO31 The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data MaryAnne DePesquo Hope, Health Services Advisory Group, Phoenix, Arizona Fen Fen Li, Health Services Advisory Group,
More informationData Quality Review for Missing Values and Outliers
Paper number: PH03 Data Quality Review for Missing Values and Outliers Ying Guo, i3, Indianapolis, IN Bradford J. Danner, i3, Lincoln, NE ABSTRACT Before performing any analysis on a dataset, it is often
More informationA Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN
Paper 045-29 A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN ABSTRACT: PROC MEANS analyzes datasets according to the variables listed in its Class
More informationThe NESTED Procedure (Chapter)
SAS/STAT 9.3 User s Guide The NESTED Procedure (Chapter) SAS Documentation This document is an individual chapter from SAS/STAT 9.3 User s Guide. The correct bibliographic citation for the complete manual
More information186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95
A Statistical Analysis Macro Library in SAS Carl R. Haske, Ph.D., STATPROBE, nc., Ann Arbor, M Vivienne Ward, M.S., STATPROBE, nc., Ann Arbor, M ABSTRACT Statistical analysis plays a major role in pharmaceutical
More informationAutomating Preliminary Data Cleaning in SAS
Paper PO63 Automating Preliminary Data Cleaning in SAS Alec Zhixiao Lin, Loan Depot, Foothill Ranch, CA ABSTRACT Preliminary data cleaning or scrubbing tries to delete the following types of variables
More informationMissing Pages Report. David Gray, PPD, Austin, TX Zhuo Chen, PPD, Austin, TX
PharmaSUG2010 - Paper DM05 Missing Pages Report David Gray, PPD, Austin, TX Zhuo Chen, PPD, Austin, TX ABSTRACT In a clinical study it is important for data management teams to receive CRF pages from investigative
More informationSAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC
PharmaSUG2010 - Paper TT06 SAS Programming Techniques for Manipulating Metadata on the Database Level Chris Speck, PAREXEL International, Durham, NC ABSTRACT One great leap that beginning and intermediate
More information/********************************************/ /* Evaluating the PS distribution!!! */ /********************************************/
SUPPLEMENTAL MATERIAL: Example SAS code /* This code demonstrates estimating a propensity score, calculating weights, */ /* evaluating the distribution of the propensity score by treatment group, and */
More informationStatistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment
Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ Aiming Yang, Merck & Co., Inc., Rahway, NJ ABSTRACT Four pitfalls are commonly
More informationTales from the Help Desk 6: Solutions to Common SAS Tasks
SESUG 2015 ABSTRACT Paper BB-72 Tales from the Help Desk 6: Solutions to Common SAS Tasks Bruce Gilsen, Federal Reserve Board, Washington, DC In 30 years as a SAS consultant at the Federal Reserve Board,
More informationSAS/STAT 15.1 User s Guide The STDIZE Procedure
SAS/STAT 15.1 User s Guide The STDIZE Procedure This document is an individual chapter from SAS/STAT 15.1 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute Inc.
More information%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System
%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System Rushi Patel, Creative Information Technology, Inc., Arlington, VA ABSTRACT It is common to find
More information- 1 - Fig. A5.1 Missing value analysis dialog box
WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation
More information2 = Disagree 3 = Neutral 4 = Agree 5 = Strongly Agree. Disagree
PharmaSUG 2012 - Paper HO01 Multiple Techniques for Scoring Quality of Life Questionnaires Brandon Welch, Rho, Inc., Chapel Hill, NC Seungshin Rhee, Rho, Inc., Chapel Hill, NC ABSTRACT In the clinical
More informationValidation Summary using SYSINFO
Validation Summary using SYSINFO Srinivas Vanam Mahipal Vanam Shravani Vanam Percept Pharma Services, Bridgewater, NJ ABSTRACT This paper presents a macro that produces a Validation Summary using SYSINFO
More informationfootnote1 height=8pt j=l "(Rev. &sysdate)" j=c "{\b\ Page}{\field{\*\fldinst {\b\i PAGE}}}";
Producing an Automated Data Dictionary as an RTF File (or a Topic to Bring Up at a Party If You Want to Be Left Alone) Cyndi Williamson, SRI International, Menlo Park, CA ABSTRACT Data dictionaries are
More informationTasks Menu Reference. Introduction. Data Management APPENDIX 1
229 APPENDIX 1 Tasks Menu Reference Introduction 229 Data Management 229 Report Writing 231 High Resolution Graphics 232 Low Resolution Graphics 233 Data Analysis 233 Planning Tools 235 EIS 236 Remote
More informationSAS Example A10. Output Delivery System (ODS) Sample Data Set sales.txt. Examples of currently available ODS destinations: Mervyn Marasinghe
SAS Example A10 data sales infile U:\Documents\...\sales.txt input Region : $8. State $2. +1 Month monyy5. Headcnt Revenue Expenses format Month monyy5. Revenue dollar12.2 proc sort by Region State Month
More informationAnalysis of Complex Survey Data with SAS
ABSTRACT Analysis of Complex Survey Data with SAS Christine R. Wells, Ph.D., UCLA, Los Angeles, CA The differences between data collected via a complex sampling design and data collected via other methods
More informationSubmitting SAS Code On The Side
ABSTRACT PharmaSUG 2013 - Paper AD24-SAS Submitting SAS Code On The Side Rick Langston, SAS Institute Inc., Cary NC This paper explains the new DOSUBL function and how it can submit SAS code to run "on
More informationMacros to Report Missing Data: An HTML Data Collection Guide Patrick Thornton, University of California San Francisco, SF, California
Macros to Report Missing Data: An HTML Data Collection Guide Patrick Thornton, University of California San Francisco, SF, California ABSTRACT This paper presents SAS macro programs that calculate missing
More informationQuick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina.
ABSTRACT PharmaSUG 2016 - Paper QT03 Quick and Efficient Way to Check the Transferred Data Divyaja Padamati, Eliassen Group Inc., North Carolina. Consistency, quality and timelines are the three milestones
More informationStat 5100 Handout #19 SAS: Influential Observations and Outliers
Stat 5100 Handout #19 SAS: Influential Observations and Outliers Example: Data collected on 50 countries relevant to a cross-sectional study of a lifecycle savings hypothesis, which states that the response
More informationContents of SAS Programming Techniques
Contents of SAS Programming Techniques Chapter 1 About SAS 1.1 Introduction 1.1.1 SAS modules 1.1.2 SAS module classification 1.1.3 SAS features 1.1.4 Three levels of SAS techniques 1.1.5 Chapter goal
More informationTRANSFORMING MULTIPLE-RECORD DATA INTO SINGLE-RECORD FORMAT WHEN NUMBER OF VARIABLES IS LARGE.
TRANSFORMING MULTIPLE-RECORD DATA INTO SINGLE-RECORD FORMAT WHEN NUMBER OF VARIABLES IS LARGE. David Izrael, Abt Associates Inc., Cambridge, MA David Russo, Independent Consultant ABSTRACT In one large
More informationIndenting with Style
ABSTRACT Indenting with Style Bill Coar, Axio Research, Seattle, WA Within the pharmaceutical industry, many SAS programmers rely heavily on Proc Report. While it is used extensively for summary tables
More informationSTEP 1 - /*******************************/ /* Manipulate the data files */ /*******************************/ <<SAS DATA statements>>
Generalized Report Programming Techniques Using Data-Driven SAS Code Kathy Hardis Fraeman, A.K. Analytic Programming, L.L.C., Olney, MD Karen G. Malley, Malley Research Programming, Inc., Rockville, MD
More informationSAS/STAT 14.3 User s Guide The SURVEYFREQ Procedure
SAS/STAT 14.3 User s Guide The SURVEYFREQ Procedure This document is an individual chapter from SAS/STAT 14.3 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute
More informationCorrecting for natural time lag bias in non-participants in pre-post intervention evaluation studies
Correcting for natural time lag bias in non-participants in pre-post intervention evaluation studies Gandhi R Bhattarai PhD, OptumInsight, Rocky Hill, CT ABSTRACT Measuring the change in outcomes between
More informationPROC MEANS for Disaggregating Statistics in SAS : One Input Data Set and One Output Data Set with Everything You Need
ABSTRACT Paper PO 133 PROC MEANS for Disaggregating Statistics in SAS : One Input Data Set and One Output Data Set with Everything You Need Imelda C. Go, South Carolina Department of Education, Columbia,
More informationPaper SDA-11. Logistic regression will be used for estimation of net error for the 2010 Census as outlined in Griffin (2005).
Paper SDA-11 Developing a Model for Person Estimation in Puerto Rico for the 2010 Census Coverage Measurement Program Colt S. Viehdorfer, U.S. Census Bureau, Washington, DC This report is released to inform
More informationRobust Linear Regression (Passing- Bablok Median-Slope)
Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their
More informationEssential ODS Techniques for Creating Reports in PDF Patrick Thornton, SRI International, Menlo Park, CA
Thornton, S. P. (2006). Essential ODS techniques for creating reports in PDF. Paper presented at the Fourteenth Annual Western Users of the SAS Software Conference, Irvine, CA. Essential ODS Techniques
More informationSpatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data
Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated
More informationAn Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step Mike Zdeb, FSL, University at Albany School of Public Health, Rensselaer, NY
SESUG 2016 Paper BB-170 An Easy Route to a Missing Data Report with ODS+PROC FREQ+A Data Step Mike Zdeb, FSL, University at Albany School of Public Health, Rensselaer, NY ABSTRACT A first step in analyzing
More informationThe G4GRID Procedure. Introduction APPENDIX 1
93 APPENDIX 1 The G4GRID Procedure Introduction 93 Data Considerations 94 Terminology 94 Using the Graphical Interface 94 Procedure Syntax 95 The PROC G4GRID Statement 95 The GRID Statement 97 The BY Statement
More informationMapping Clinical Data to a Standard Structure: A Table Driven Approach
ABSTRACT Paper AD15 Mapping Clinical Data to a Standard Structure: A Table Driven Approach Nancy Brucken, i3 Statprobe, Ann Arbor, MI Paul Slagle, i3 Statprobe, Ann Arbor, MI Clinical Research Organizations
More informationRight-click on whatever it is you are trying to change Get help about the screen you are on Help Help Get help interpreting a table
Q Cheat Sheets What to do when you cannot figure out how to use Q What to do when the data looks wrong Right-click on whatever it is you are trying to change Get help about the screen you are on Help Help
More informationWEB MATERIAL. eappendix 1: SAS code for simulation
WEB MATERIAL eappendix 1: SAS code for simulation /* Create datasets with variable # of groups & variable # of individuals in a group */ %MACRO create_simulated_dataset(ngroups=, groupsize=); data simulation_parms;
More informationSAS Enterprise Miner : Tutorials and Examples
SAS Enterprise Miner : Tutorials and Examples SAS Documentation February 13, 2018 The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2017. SAS Enterprise Miner : Tutorials
More informationChoosing the Right Procedure
3 CHAPTER 1 Choosing the Right Procedure Functional Categories of Base SAS Procedures 3 Report Writing 3 Statistics 3 Utilities 4 Report-Writing Procedures 4 Statistical Procedures 5 Efficiency Issues
More informationGenerating Customized Analytical Reports from SAS Procedure Output Brinda Bhaskar and Kennan Murray, RTI International
Abstract Generating Customized Analytical Reports from SAS Procedure Output Brinda Bhaskar and Kennan Murray, RTI International SAS has many powerful features, including MACRO facilities, procedures such
More informationT.I.P.S. (Techniques and Information for Programming in SAS )
Paper PO-088 T.I.P.S. (Techniques and Information for Programming in SAS ) Kathy Harkins, Carolyn Maass, Mary Anne Rutkowski Merck Research Laboratories, Upper Gwynedd, PA ABSTRACT: This paper provides
More informationEffects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex
Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex Keiko I. Powers, Ph.D., J. D. Power and Associates, Westlake Village, CA ABSTRACT Discrete time series
More informationTRANSFORMING MULTIPLE-RECORD DATA INTO SINGLE-RECORD FORMAT WHEN NUMBER OF VARIABLES IS LARGE.
TRANSFORMING MULTIPLE-RECORD DATA INTO SINGLE-RECORD FORMAT WHEN NUMBER OF VARIABLES IS LARGE. David Izrael, Abt Associates Inc., Cambridge, MA David Russo, Independent Consultant ABSTRACT In one large
More informationData Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski
Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...
More informationPROC REPORT Basics: Getting Started with the Primary Statements
Paper HOW07 PROC REPORT Basics: Getting Started with the Primary Statements Arthur L. Carpenter California Occidental Consultants, Oceanside, California ABSTRACT The presentation of data is an essential
More informationPharmaSUG China Paper 059
PharmaSUG China 2016 - Paper 059 Using SAS @ to Assemble Output Report Files into One PDF File with Bookmarks Sam Wang, Merrimack Pharmaceuticals, Inc., Cambridge, MA Kaniz Khalifa, Leaf Research Services,
More informationAn Algorithm to Compute Exact Power of an Unordered RxC Contingency Table
NESUG 27 An Algorithm to Compute Eact Power of an Unordered RC Contingency Table Vivek Pradhan, Cytel Inc., Cambridge, MA Stian Lydersen, Department of Cancer Research and Molecular Medicine, Norwegian
More informationA Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys
A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys Richard L. Downs, Jr. and Pura A. Peréz U.S. Bureau of the Census, Washington, D.C. ABSTRACT This paper explains
More informationCleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA
Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA ABSTRACT Removing duplicate observations from a data set is not as easy as it might
More informationBase and Advance SAS
Base and Advance SAS BASE SAS INTRODUCTION An Overview of the SAS System SAS Tasks Output produced by the SAS System SAS Tools (SAS Program - Data step and Proc step) A sample SAS program Exploring SAS
More informationAn introduction to SPSS
An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible
More information