USING MACROS TO CREATE PARAMETER DRIVEN PROCEDURES THAT SUMMARIZE AND PRESENT STATISTICAL OUTPUT IN TABULAR FORM
|
|
- Candice Gardner
- 5 years ago
- Views:
Transcription
1 458 Statistics USING MACROS TO CREATE PARAMETER DRIVEN PROCEDURES THAT SUMMARIZE AND PRESENT STATISTICAL OUTPUT IN TABULAR FORM John A. Wenston National Development and Research Institutes, Inc. INTRODUCTION While SAS has a wide range of powerful statistical procedures, finding relevant statistics, particularly when doing exploratory analysis, and presenting them in an easy to read format often involves wading through reams of output, and constructing summary tables manually. Moreover, some procedures do not provide certain measures that are important to users in evaluating results. This paper demonstrates how to use the SAS macro facility to create parameter driven procedures that summarize and present statistical output in tabular form. The strategy employed is to write procedures that "capture" key statistics and place them in a SAS data set, which can then be summarized by USing DATA steps and PROCs, and printed using PUT statements. PROC PRINT. or PROC REPORT. There are two ways of getting statistics into a SAS data set. The first is to run Statistical PROCs and direct printed output to a sequential data file with PROC PRINTTO. A SAS DATA step can then read this file, pick up key statistics, and put them into a SAS data set. The second is to use the OUTPUT and OUTEST data set options provided by most SAS statistical procedures. This paper will demonstrate both methods, using PROC LOGISTIC. The data base used in the examples is from a NIDAfunded cross sectional survey research project, Risk Factors for AIDS Among Intravenous Drug Users (NIDA # R01 DA3574, Don C. DesJarlais Principal Investigator). The data base contains information from structured questionnaires and blood tests, and includes data on HIV status, demographics, risk behaviors and service utilization. Each observation contains approximately 900 variables. and there are over 2000 respondents enrolled in the study. THE BASIC REPORT Figure 1 is an example of a report based on PROC LOGISTIC. The report summarizes the results of three separate bivariate logistic regressions. The dependent variable is HIV status (0 = negative. 1 = positive) and the predictor variables are SPDBALL (whether the respondent injects speedball, a mixture of heroin and cocaine). CRACK (whether the respondent smokes crack), and INJ4_DAY (whether the respondent injects 4+ times per day). For brevity's sake only three variables are included in the report. More commonly a report will summarize the results of 30 or 40 bivariate regressions. The report macro uses another macro. MPARSE. that takes a list of variables as an argument (each variable must be separated by at least one space). and returns macro variables in the form &X1,&X2...&Xn. where &X1 contains the name of the first variable in the list. &X2 the second, etc. In addition. MPARSE creates a variable, &NVARS. that contains the number of variables in the list. The code for MPARSE is: %macro mparse(x=); %do 1=1 %to 100; %global x&i; %global nvars; %Iet 1=1; %Iet z=init; %do %whije (&1 <= 100 and &z ne ); %Iet z=%scan(&x,&i); %if &z ne %then %do; %Iet x&i = &z; %Iet nvars=&i; %Iet i = %eval(&1+1); %mend mparsej As an example, the macro call %mparse(x=a be); would produce &X1, &X2 and &X3. equal to a, b, and c respectively. as well as &NVARS. equal to 3. The program sets up a DO WHILE loop and uses the %SCAN function to pick out the names in the Ust, which are then assigned to successively numbered macro variables. When the end of the list is reached, the function returns a null value and the program drops out of the loop. (The loop also ends after the 100th variable in the list is read) Note that the
2 statistics 459 MACRO EXAMPLE Parameters and Odds Ratio Estimates Using Proc Logistic Each Variable Was Entered in a Separate Equation Dependent Variable=HIV Predictor Nfor Nfor Variable Event=<> Event=1 SPDBALL CRACK I NJ4_DAY Beta WaI:I 95%CI Sid Cli Odds Lower Err Square Ratio Bound Figure 1 95"IoCI Upper Bound p(chi) Data Set: WORK.RISK Response Variable: HIV Response Levels: 2 Number of Observations: 877 Link Function: Logit The LOGISTIC Procedure Response Profile Ordered Valle HIV 1 Y 2 N Count WARNING: 29 observation(s) were deleted due to missing values for the response or explanatory variables. Criteria for Assessing Model Rt Intercept Intercept and Criterion Only Covariates ~are for Covariates AlC SC -2LOO L Score Variable Parameter Estimate INTERCPT SPDBALL with 1 OF (p=o.0001) with 1 OF (p=0.0001) Analysis of Maximum Likelihood Estimates Standard Enor WaI:I Chi-SqJare _7615 Pr> Chi-square Association of Predicted Probabilities and Observed Responses Concordant = 27.7% Somers' 0= Discordant = 15.7% Garrma = Tied = 56.7% Tau-a = ( pairs) c = Figure 2 Standarcized Estimate
3 460 statistics program declares the &Xi and &NVARS variables as GLOBAL. This is because the call to MPARSE is usually made from within another macro, and the %GLOBAL declarations are needed to make the variables available to the macro that makes the call. Also, note that the upper limit on the number of variables is set to 100. This can be adjusted simply by changing the limits of the loops in the macro. 1. REPORT USING PROC PRINTTO Figure 2 is an example of printed output from PROC LOGISTIC. The output was produced by the following code: value yesnofmt O:'N' 1:' Y'; proc logistic noslmple order=formatted; where race ne 1; I*exclude whltes*' model hlv:spdball; format hlv yesnofmt.; The option NOSIMPLE prevents the printing of simple statistics for each independent variable, thus reducing the length of the print file. The ORDER=FORMATTED option, along with yesnofmt, makes the event value (1=' Y') lower than the nonevent value (O='N'). This is done because the PROC takes the lower of two values to represent the event, while the Risk Factors study coding scheme uses 1 to represent the event and 0 the non-event. If the output shown in Figure 2 is directed to a file with PRINTTO, each record in the file will correspond to one line in the printout. Note that three lines contain items that are highlighted. These are the data items required for the report. The object is to design a program that will pick up this information from the print file, and to write the program using the macro language so that it is general purpose. The macro that generates the logistic report is called BILOGS (for Bivariate LOGistic regressions), and was developed on a CMS system using Version The macro requires 5 parameters: DATA - The name of the data set that the regressions will be performed on. Y- The name of the dependent variable. This variable.llllj.s1 be coded 0/1, with 1 =event. XLiST - A list of predictor variables, separated from one another by at least one space. WHERE - An optional condition conforming to the rules of the WHERE statement that limits processing to a subset of the data. TITLE - An optional title to appear at the top of each page of output. The macro BILOGS follows: %macro bllogs(data:,xllst:,y:,where:, tltle=); 1* parse xlist to create &x1-&xn, &nvars */ %mparse(x:&xlist); value ynfmt O='N' 1=' Y'; options nocenter IIneslze=79; /* ***"" Direct output to a disk file * / filename out1 'Iogdat dat' recfm=f Irecl:80; proc prlntto print:out1 new; run; /* Do regressions with output sent to file*/ %do 1=1 %to &nvars; proc logistic noslmple data:&data order=formatted; %if &where ne %then %str(where &where;); model &y = &&x&l; format &y ynfmt.; proc printto; run; 1* redirect print to the default device */ 1* Now read the file of printed output */ data ddest; Inflle out1; retain n-yes 0 n_no 0; length firstwrd $ 8; /* ***** get first word In line**""* */ Input firstwrd $;,. *Test to determine If It Is "Value". If so,*',. get number yes(1) & no(o) responses */ If upcase(firstwrd)='v ALUE' then do; Input ordval1 fmtval1 $ n-yes ordvalo fmtvalo $ n_no; 1* Else determine if It Is "INTERCpr' *' 1*' If so, get the parameter values */ else If flrstwrd='intercpr then do; Input name $ beta se chi pvalue sc; oddratio:exp(beta); 1* odds ratio */ 1* Compute lower &upper bounds 95% CI *'
4 Statistics 461 low95=exp{beta-1.96*se); up95=exp(beta+ 1.96*se); depvar="&y"; output; keep beta se chi oddratlo low95 up95 pvalue depvar name n_no n_yes; f* Now use Proc Print to print report * */ options center Iineslze=79 pageno=1; proc print label split=' '; by depvar; var n_no n-yes beta se chi odd ratio low95 up95 pvalue; id name; label name='predlctor*varlable' n_no=' N for*event=o' n_yes=' N for*event=1' beta='beta' se='std*err' chi=' Wald * Chi Square' oddratlo=' Odds *Ratio' low95='95% CI*Lower*Bound' up95='95% CI*Upper*Bound' pvalue='p(ch I)' depvar='oependent Variable'; format odd ratio low95 up pvalue 5.3 beta se 6.3; title "&title"; title2 'Parameters and Odds Ratio Estimates Using Proc Logistic'; title3 'Each Variable Was Entered In a Separate Equation'; %mend bllogs; The following command would produce the report shown in Figure 1 : %bilogs(data=risk,y=hlv, xlist=spdball crack Inj4_day, where=race ne 1, tltle=macro EXAMPLE); The macro first invokes %MPARSE to create &X1 =spdball, X2=crack, &X3=inj4_day and &NVARS = 3. Next, the macro uses PRINTTO to redirect any printed output that follows to a disk file rather than the default print device. The macro then sets up a loop that runs PROC LOGISTIC once for each X variable, sending three "pages" of output to the file LOGDAT OAT. Once the loop is finished, PRINTTO is invoked again with no FILE= specification. This redirects any subsequent printed output back to the default device. The next section of the program picks up information from the print file by reading the first word of every fine and checking its contents for one of two "trigger" values. If the first word matches the value, the program reads the statistics from the next fine in the file and puts them in the SAS data set being created. In Figure 2 the first pieces of information needed are the number of Y(1) and N(O) responses, which are immediately below the line beginning with the word "Value". When the macro finds that the first word of a line is Value", it reads the number of Yes responses from the next fine into the variable N_ YES, and reads the number of No responses from the following line, into N NO. Because of the RETAIN statement, these values will be retained through subsequent iterations of the DATA step. The next trigger value is "INTERCPT". If the first word in the line is not "Value", the program checks for the word "INTERCPT". If it finds it, the macro drops to the next line and picks up the variable name, the beta coefficient, the standard error, the Wald chi squared, the p-value of the chi squared, and the standardized coefficient. The macro then calculates the odds ratio and 95% confidence Hmits, and OUTPUTs an observation to the SAS data set. The program will continue reading, and adding observations to the data set when it encounters trigger values, until it comes to the end of the print file. Then PROC PRINT is invoked to produce the report shown in Figure REPORTS USING OUTEST FILES Figure 3 shows a sample of an OUTEST data set produced with PAOC LOGISTIC. The code used to produce the output is: value yesnofmt O='N' 1 =' Y'; proc logistic noprlnt order=formatted outest=estimate covout; where race ne 1; model hlv=spdball; format hlv yesnofmt.; proc print data=estimatej In addition to the OUTEST = option that produces the data set, the COVOUT option is used to generate the covariance matrix. As before, the ORDER = FORMATTED option is selected, and NESUG '92 proceedings
5 462 statistics CBS _LlNK lype NAME_ INTERCEP SPDBALL 1 LOOIT PAAMS ESTIMATE _ LOOIT COV INTERCPT LOOIT COV SPDBALL Flaure 3 NOPAINT is also specified to suppress printed output. The statistics. needed to produce the report are highlighted. The first observation contains the beta coefficient of the predictor variable. The third observation contains the diagonal element of the estimated covariance matrix, the square root of which is the standard error of the beta. The Wald chisquared is equal to the square of the beta divided by the standard error, and the p-value is computed with respect to the chi-squared distribution with one degree of freedom. The two statistics needed for the report that are not in the OUT EST data set are the number of yes and no responses. These can be computed by using PAOC MEANS. The macro BILOGS2 uses an OUTEST data set, and a data set produced by PAOC MEANS to produce the report in Figure 1. The parameters used in the macro are identical to those in BILOGS. The code for BILOGS2 follows (the part of the code using PAOe PAINT is identical to that in BILOGS and is omitted): %macro bilogs2(data=,xlist=,y=,where=, tltle=); r parse xllst to create xi xn & nvars./ %mparse(x=&xlfst) ; value ynfmt O='N' 1=' Y'; /*.* Do regressions with estimates placed in outest files.* */ %do 1:1 %to &nvars; proc logistic nosimple data:&data order=formatted noprlnt covout outest=dd&l; %If &where ne %then %str(where &wherej); model &y = &&x&l; format &y ynfmt.j 1* **** Now create files containing parameters for each x variable *** */ %do I = 1 %to &nvars; data est&lj set dd&lj retain beta; /"1st obs of each est file contains beta */ If _n_=1 then beta = &&x&l; /* 3rd record contains std error */ if _n_=3 then do; se=sq rt( &&x& I); 1* From beta, std err compute other stats */ ch 1= (beta/se )**2; pvalue:1 probchi(chl,1 ); oddratlo=exp(beta); low95=exp(beta 1.96*se); up95=exp(beta+1.96*se); name=" &&x&i"; depvar="&y"; output; %endj keep beta se chi odd ratio low95 up95 pvalue depvar name; 1* * Combine data sets containing stats" */ data ddest; set %do I = 1 %to &nvarsj est&1 ; 1* **** Now compute number of 0 and 1 answers for dependent var *** *f proc means data=&data noprlnt; %If &where ne %then %str(where &where and (&y=1 or &y=o)j)j %else %str(where &y=1 or &y=oj)i class &y; var &xllst; output out=yes_no n= NESUG 192 Proceedings
6 Statistics 463 1* create one obs for each variable *' proc transpose out=y_n; where _type_=1; data yes_no; set y_n; %do 1=1 %to &nvars; If upcaselname.j=upcase("&&x&i") then do; n_no = col1; njes=coi2; output; keep njes n_no; 1* Add Info to data set containing stats *' data ddest: merge ddest yes_no; 1* Now use Proc Print to print report *' %mend bllogs2; As with BllOGS, the macro call is: %bllogs2( data= risk,y= hiv, xllst=spdball crack Inj4_day, where=race ne 1, tltie=macro EXAMPLE); BilOGS2 starts with MPARSE. But instead of proceeding to use PRINTTO, the macro does one regression for each of the 3 independent variables and puts the estimates out to separate OUT EST files: 001, 002 and 003. Each file contains 3 observations, one with TYPE =PARMS and two with _TYPE_=COV (See Figure 3). A loop then computes statistics for each OUTEST data set, taking the beta from the first observation and the standard error from the third observation, and outputting three data sets, EST1, EST2 and EST3. These are then combined into a single data set,ddest. create a data set with one observation for each independent variable, with each observation containing the number of yes and no responses. Finally, this data set is MERGEd with DDEST to produce a data set that has all the relevant information, and can be used as input for PROC PRINT to produce the report. CONCLUSION The macros shown here are "bare bones" models of somewhat more elaborate ones that include options to print independent variable labels and specify the width of the confidence intervals. In addition, there are a number of other macros, including ones that summarize 2X2 contingency table output, t-tests and disease incidence risk ratios, that have been developed for the Risk Factors study and are used regularly at NDRI (code for these is available from the author). Regardless of how complex the macros are, however, all build upon the two methods presented in this paper. In general, the method using PROC PRINTTO has the advantage of being fairly straight forward: one simply "picks up" from the print file the information he or she looks for when going through printouts manually. In addition, a knowledge of exactly how statistics are calculated is not required. On the down side, whenever the format of the printed output from a procedure changes, either from one version to the next or across platforms, the macro must be revised. The method using OUTEST and OUTPUT files has the advantages of increased flexibility, in that one is not limited to the information on a print out, and portability. On the other hand, the programmer has to have a detailed knowledge of how certain statistics are calculated, and, since information may have to be drawn from different PROCs (as in the example presented here) and may require more file manipulation, more programming sophistication is needed. Next, the macro computes the number of HIV=1 and HIV=O responses for each ot the regressions. For this, the program uses PROC MEANS to create a data set with two records, the first containing the number of observations with HIV=O and the second the number of observations with HIV=1 for each of the 3 independent variables. PROC TRANSPOSE is then used in conjunction with a DATA step to
Stat 5100 Handout #14.a SAS: Logistic Regression
Stat 5100 Handout #14.a SAS: Logistic Regression Example: (Text Table 14.3) Individuals were randomly sampled within two sectors of a city, and checked for presence of disease (here, spread by mosquitoes).
More informationRepeated Measures Part 4: Blood Flow data
Repeated Measures Part 4: Blood Flow data /* bloodflow.sas */ options linesize=79 pagesize=100 noovp formdlim='_'; title 'Two within-subjecs factors: Blood flow data (NWK p. 1181)'; proc format; value
More informationrun ld50 /* Plot the onserved proportions and the fitted curve */ DATA SETR1 SET SETR1 PROB=X1/(X1+X2) /* Use this to create graphs in Windows */ gopt
/* This program is stored as bliss.sas */ /* This program uses PROC LOGISTIC in SAS to fit models with logistic, probit, and complimentary log-log link functions to the beetle mortality data collected
More informationBivariate (Simple) Regression Analysis
Revised July 2018 Bivariate (Simple) Regression Analysis This set of notes shows how to use Stata to estimate a simple (two-variable) regression equation. It assumes that you have set Stata up on your
More informationSD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG
Paper SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG Qixuan Chen, University of Michigan, Ann Arbor, MI Brenda Gillespie, University of Michigan, Ann Arbor, MI ABSTRACT This paper
More information186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95
A Statistical Analysis Macro Library in SAS Carl R. Haske, Ph.D., STATPROBE, nc., Ann Arbor, M Vivienne Ward, M.S., STATPROBE, nc., Ann Arbor, M ABSTRACT Statistical analysis plays a major role in pharmaceutical
More informationPoisson Regressions for Complex Surveys
Poisson Regressions for Complex Surveys Overview Researchers often use sample survey methodology to obtain information about a large population by selecting and measuring a sample from that population.
More informationLinear Methods for Regression and Shrinkage Methods
Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares Input vectors
More informationAn introduction to SPSS
An introduction to SPSS To open the SPSS software using U of Iowa Virtual Desktop... Go to https://virtualdesktop.uiowa.edu and choose SPSS 24. Contents NOTE: Save data files in a drive that is accessible
More informationPanel Data 4: Fixed Effects vs Random Effects Models
Panel Data 4: Fixed Effects vs Random Effects Models Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised April 4, 2017 These notes borrow very heavily, sometimes verbatim,
More informationStat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors
Week 10: Autocorrelated errors This week, I have done one possible analysis and provided lots of output for you to consider. Case study: predicting body fat Body fat is an important health measure, but
More informationCH5: CORR & SIMPLE LINEAR REFRESSION =======================================
STAT 430 SAS Examples SAS5 ===================== ssh xyz@glue.umd.edu, tap sas913 (old sas82), sas https://www.statlab.umd.edu/sasdoc/sashtml/onldoc.htm CH5: CORR & SIMPLE LINEAR REFRESSION =======================================
More informationTwo useful macros to nudge SAS to serve you
Two useful macros to nudge SAS to serve you David Izrael, Michael P. Battaglia, Abt Associates Inc., Cambridge, MA Abstract This paper offers two macros that augment the power of two SAS procedures: LOGISTIC
More informationIntroduction to SAS proc calis
Introduction to SAS proc calis /* path1.sas */ %include 'SenicRead.sas'; title2 ''; /************************************************************************ * * * Cases are hospitals * * * * stay Average
More informationGenerating Customized Analytical Reports from SAS Procedure Output Brinda Bhaskar and Kennan Murray, RTI International
Abstract Generating Customized Analytical Reports from SAS Procedure Output Brinda Bhaskar and Kennan Murray, RTI International SAS has many powerful features, including MACRO facilities, procedures such
More informationUnit 5 Logistic Regression Practice Problems
Unit 5 Logistic Regression Practice Problems SOLUTIONS R Users Source: Afifi A., Clark VA and May S. Computer Aided Multivariate Analysis, Fourth Edition. Boca Raton: Chapman and Hall, 2004. Exercises
More informationDidacticiel - Études de cas
Subject In some circumstances, the goal of the supervised learning is not to classify examples but rather to organize them in order to point up the most interesting individuals. For instance, in the direct
More informationCHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA
Examples: Mixture Modeling With Cross-Sectional Data CHAPTER 7 EXAMPLES: MIXTURE MODELING WITH CROSS- SECTIONAL DATA Mixture modeling refers to modeling with categorical latent variables that represent
More informationBiostat Methods STAT 5820/6910 Handout #9 Meta-Analysis Examples
Biostat Methods STAT 5820/6910 Handout #9 Meta-Analysis Examples Example 1 A RCT was conducted to consider whether steroid therapy for expectant mothers affects death rate of premature [less than 37 weeks]
More informationST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.
ST512 Fall Quarter, 2005 Exam 1 Name: Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false. 1. (42 points) A random sample of n = 30 NBA basketball
More informationFrequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS
ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion
More informationStatistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment
Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ Aiming Yang, Merck & Co., Inc., Rahway, NJ ABSTRACT Four pitfalls are commonly
More informationSTA431 Handout 9 Double Measurement Regression on the BMI Data
STA431 Handout 9 Double Measurement Regression on the BMI Data /********************** bmi5.sas **************************/ options linesize=79 pagesize = 500 noovp formdlim='-'; title 'BMI and Health:
More information8. MINITAB COMMANDS WEEK-BY-WEEK
8. MINITAB COMMANDS WEEK-BY-WEEK In this section of the Study Guide, we give brief information about the Minitab commands that are needed to apply the statistical methods in each week s study. They are
More informationRegression. Dr. G. Bharadwaja Kumar VIT Chennai
Regression Dr. G. Bharadwaja Kumar VIT Chennai Introduction Statistical models normally specify how one set of variables, called dependent variables, functionally depend on another set of variables, called
More informationStat 342 Exam 3 Fall 2014
Stat 34 Exam 3 Fall 04 I have neither given nor received unauthorized assistance on this exam. Name Signed Date Name Printed There are questions on the following 6 pages. Do as many of them as you can
More informationThe results section of a clinicaltrials.gov file is divided into discrete parts, each of which includes nested series of data entry screens.
OVERVIEW The ClinicalTrials.gov Protocol Registration System (PRS) is a web-based tool developed for submitting clinical trials information to ClinicalTrials.gov. This document provides step-by-step instructions
More informationCredit card Fraud Detection using Predictive Modeling: a Review
February 207 IJIRT Volume 3 Issue 9 ISSN: 2396002 Credit card Fraud Detection using Predictive Modeling: a Review Varre.Perantalu, K. BhargavKiran 2 PG Scholar, CSE, Vishnu Institute of Technology, Bhimavaram,
More informationSAS Graphics Macros for Latent Class Analysis Users Guide
SAS Graphics Macros for Latent Class Analysis Users Guide Version 2.0.1 John Dziak The Methodology Center Stephanie Lanza The Methodology Center Copyright 2015, Penn State. All rights reserved. Please
More information3.6 Sample code: yrbs_data <- read.spss("yrbs07.sav",to.data.frame=true)
InJanuary2009,CDCproducedareportSoftwareforAnalyisofYRBSdata, describingtheuseofsas,sudaan,stata,spss,andepiinfoforanalyzingdatafrom theyouthriskbehaviorssurvey. ThisreportprovidesthesameinformationforRandthesurveypackage.Thetextof
More informationJMP Clinical. Release Notes. Version 5.0
JMP Clinical Version 5.0 Release Notes Creativity involves breaking out of established patterns in order to look at things in a different way. Edward de Bono JMP, A Business Unit of SAS SAS Campus Drive
More informationPurposeful Selection of Variables in Logistic Regression: Macro and Simulation Results
ection on tatistical Computing urposeful election of Variables in Logistic Regression: Macro and imulation Results Zoran ursac 1, C. Heath Gauss 1, D. Keith Williams 1, David Hosmer 2 1 iostatistics, University
More informationWeek 4: Simple Linear Regression III
Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of
More informationStatistical Programming in SAS. From Chapter 10 - Programming with matrices and vectors - IML
Week 12 [30+ Nov.] Class Activities File: week-12-iml-prog-16nov08.doc Directory: \\Muserver2\USERS\B\\baileraj\Classes\sta402\handouts From Chapter 10 - Programming with matrices and vectors - IML 10.1:
More informationExercise: Graphing and Least Squares Fitting in Quattro Pro
Chapter 5 Exercise: Graphing and Least Squares Fitting in Quattro Pro 5.1 Purpose The purpose of this experiment is to become familiar with using Quattro Pro to produce graphs and analyze graphical data.
More informationAnnotated multitree output
Annotated multitree output A simplified version of the two high-threshold (2HT) model, applied to two experimental conditions, is used as an example to illustrate the output provided by multitree (version
More informationCH9.Generalized Additive Model
CH9.Generalized Additive Model Regression Model For a response variable and predictor variables can be modeled using a mean function as follows: would be a parametric / nonparametric regression or a smoothing
More informationFathom Dynamic Data TM Version 2 Specifications
Data Sources Fathom Dynamic Data TM Version 2 Specifications Use data from one of the many sample documents that come with Fathom. Enter your own data by typing into a case table. Paste data from other
More informationProduct Catalog. AcaStat. Software
Product Catalog AcaStat Software AcaStat AcaStat is an inexpensive and easy-to-use data analysis tool. Easily create data files or import data from spreadsheets or delimited text files. Run crosstabulations,
More informationThe SAS RELRISK9 Macro
The SAS RELRISK9 Macro Sally Skinner, Ruifeng Li, Ellen Hertzmark, and Donna Spiegelman November 15, 2012 Abstract The %RELRISK9 macro obtains relative risk estimates using PROC GENMOD with the binomial
More informationExample Using Missing Data 1
Ronald H. Heck and Lynn N. Tabata 1 Example Using Missing Data 1 Creating the Missing Data Variable (Miss) Here is a data set (achieve subset MANOVAmiss.sav) with the actual missing data on the outcomes.
More informationStatistics & Analysis. Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2
Fitting Generalized Additive Models with the GAM Procedure in SAS 9.2 Weijie Cai, SAS Institute Inc., Cary NC July 1, 2008 ABSTRACT Generalized additive models are useful in finding predictor-response
More informationStat 5100 Handout #11.a SAS: Variations on Ordinary Least Squares
Stat 5100 Handout #11.a SAS: Variations on Ordinary Least Squares Example 1: (Weighted Least Squares) A health researcher is interested in studying the relationship between diastolic blood pressure (bp)
More informationRegression Analysis and Linear Regression Models
Regression Analysis and Linear Regression Models University of Trento - FBK 2 March, 2015 (UNITN-FBK) Regression Analysis and Linear Regression Models 2 March, 2015 1 / 33 Relationship between numerical
More informationBUSINESS ANALYTICS. 96 HOURS Practical Learning. DexLab Certified. Training Module. Gurgaon (Head Office)
SAS (Base & Advanced) Analytics & Predictive Modeling Tableau BI 96 HOURS Practical Learning WEEKDAY & WEEKEND BATCHES CLASSROOM & LIVE ONLINE DexLab Certified BUSINESS ANALYTICS Training Module Gurgaon
More informationSection 3.2: Multiple Linear Regression II. Jared S. Murray The University of Texas at Austin McCombs School of Business
Section 3.2: Multiple Linear Regression II Jared S. Murray The University of Texas at Austin McCombs School of Business 1 Multiple Linear Regression: Inference and Understanding We can answer new questions
More informationIntroduction to SAS proc calis
Introduction to SAS proc calis /* path1.sas */ %include 'SenicRead.sas'; title2 'Path Analysis Example for 3 Observed Variables'; /************************************************************************
More informationExtending ODS Output by Incorporating
Paper PO1 Extending ODS Output by Incorporating Trellis TM Graphics from S-PLUS Robert Treder, Ph. D., Insightful Corporation, Seattle WA Jagrata Minardi, Ph. D., Insightful Corporation, Seattle WA ABSTRACT
More informationA Macro for Systematic Treatment of Special Values in Weight of Evidence Variable Transformation Chaoxian Cai, Automated Financial Systems, Exton, PA
Paper RF10-2015 A Macro for Systematic Treatment of Special Values in Weight of Evidence Variable Transformation Chaoxian Cai, Automated Financial Systems, Exton, PA ABSTRACT Weight of evidence (WOE) recoding
More informationCorrectly Compute Complex Samples Statistics
PASW Complex Samples 17.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample
More informationPaper SDA-11. Logistic regression will be used for estimation of net error for the 2010 Census as outlined in Griffin (2005).
Paper SDA-11 Developing a Model for Person Estimation in Puerto Rico for the 2010 Census Coverage Measurement Program Colt S. Viehdorfer, U.S. Census Bureau, Washington, DC This report is released to inform
More informationIntroduction to Mixed Models: Multivariate Regression
Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate
More informationCorrectly Compute Complex Samples Statistics
SPSS Complex Samples 15.0 Specifications Correctly Compute Complex Samples Statistics When you conduct sample surveys, use a statistics package dedicated to producing correct estimates for complex sample
More informationCHAPTER 7 ASDA ANALYSIS EXAMPLES REPLICATION-SPSS/PASW V18 COMPLEX SAMPLES
CHAPTER 7 ASDA ANALYSIS EXAMPLES REPLICATION-SPSS/PASW V18 COMPLEX SAMPLES GENERAL NOTES ABOUT ANALYSIS EXAMPLES REPLICATION These examples are intended to provide guidance on how to use the commands/procedures
More informationThe SAS %BLINPLUS Macro
The SAS %BLINPLUS Macro Roger Logan and Donna Spiegelman April 10, 2012 Abstract The macro %blinplus corrects for measurement error in one or more model covariates logistic regression coefficients, their
More informationEnterprise Miner Tutorial Notes 2 1
Enterprise Miner Tutorial Notes 2 1 ECT7110 E-Commerce Data Mining Techniques Tutorial 2 How to Join Table in Enterprise Miner e.g. we need to join the following two tables: Join1 Join 2 ID Name Gender
More informationCreating a data file and entering data
4 Creating a data file and entering data There are a number of stages in the process of setting up a data file and analysing the data. The flow chart shown on the next page outlines the main steps that
More informationUsing SAS Macros to Extract P-values from PROC FREQ
SESUG 2016 ABSTRACT Paper CC-232 Using SAS Macros to Extract P-values from PROC FREQ Rachel Straney, University of Central Florida This paper shows how to leverage the SAS Macro Facility with PROC FREQ
More informationSAS Macros for Binning Predictors with a Binary Target
ABSTRACT Paper 969-2017 SAS Macros for Binning Predictors with a Binary Target Bruce Lund, Magnify Analytic Solutions, Detroit MI, Wilmington DE, Charlotte NC Binary logistic regression models are widely
More informationGeneralized Additive Model
Generalized Additive Model by Huimin Liu Department of Mathematics and Statistics University of Minnesota Duluth, Duluth, MN 55812 December 2008 Table of Contents Abstract... 2 Chapter 1 Introduction 1.1
More informationrange: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%
------------------ log: \Term 2\Lecture_2s\regression1a.log log type: text opened on: 22 Feb 2008, 03:29:09. cmdlog using " \Term 2\Lecture_2s\regression1a.do" (cmdlog \Term 2\Lecture_2s\regression1a.do
More informationPoisson Regression and Model Checking
Poisson Regression and Model Checking Readings GH Chapter 6-8 September 27, 2017 HIV & Risk Behaviour Study The variables couples and women_alone code the intervention: control - no counselling (both 0)
More informationMultidimensional Latent Regression
Multidimensional Latent Regression Ray Adams and Margaret Wu, 29 August 2010 In tutorial seven, we illustrated how ConQuest can be used to fit multidimensional item response models; and in tutorial five,
More informationCalculating measures of biological interaction
European Journal of Epidemiology (2005) 20: 575 579 Ó Springer 2005 DOI 10.1007/s10654-005-7835-x METHODS Calculating measures of biological interaction Tomas Andersson 1, Lars Alfredsson 1,2, Henrik Ka
More informationWeek 5: Multiple Linear Regression II
Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R
More informationPaper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by
Paper CC-016 A macro for nearest neighbor Lung-Chang Chien, University of North Carolina at Chapel Hill, Chapel Hill, NC Mark Weaver, Family Health International, Research Triangle Park, NC ABSTRACT SAS
More informationChapter 9 Robust Regression Examples
Chapter 9 Robust Regression Examples Chapter Table of Contents OVERVIEW...177 FlowChartforLMS,LTS,andMVE...179 EXAMPLES USING LMS AND LTS REGRESSION...180 Example 9.1 LMS and LTS with Substantial Leverage
More informationCHAPTER 1 INTRODUCTION
Introduction CHAPTER 1 INTRODUCTION Mplus is a statistical modeling program that provides researchers with a flexible tool to analyze their data. Mplus offers researchers a wide choice of models, estimators,
More informationConditional and Unconditional Regression with No Measurement Error
Conditional and with No Measurement Error /* reg2ways.sas */ %include 'readsenic.sas'; title2 ''; proc reg; title3 'Conditional Regression'; model infrisk = stay census; proc calis cov; /* Analyze the
More information1. What specialist uses information obtained from bones to help police solve crimes?
Mathematics: Modeling Our World Unit 4: PREDICTION HANDOUT VIDEO VIEWING GUIDE H4.1 1. What specialist uses information obtained from bones to help police solve crimes? 2.What are some things that can
More informationLinear Model Selection and Regularization. especially usefull in high dimensions p>>100.
Linear Model Selection and Regularization especially usefull in high dimensions p>>100. 1 Why Linear Model Regularization? Linear models are simple, BUT consider p>>n, we have more features than data records
More informationMultivariate Normal Random Numbers
Multivariate Normal Random Numbers Revised: 10/11/2017 Summary... 1 Data Input... 3 Analysis Options... 4 Analysis Summary... 5 Matrix Plot... 6 Save Results... 8 Calculations... 9 Summary This procedure
More informationCHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT
CHAPTER 5. BASIC STEPS FOR MODEL DEVELOPMENT This chapter provides step by step instructions on how to define and estimate each of the three types of LC models (Cluster, DFactor or Regression) and also
More informationTHIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010
THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE
More informationbook 2014/5/6 15:21 page v #3 List of figures List of tables Preface to the second edition Preface to the first edition
book 2014/5/6 15:21 page v #3 Contents List of figures List of tables Preface to the second edition Preface to the first edition xvii xix xxi xxiii 1 Data input and output 1 1.1 Input........................................
More informationChapter 17: INTERNATIONAL DATA PRODUCTS
Chapter 17: INTERNATIONAL DATA PRODUCTS After the data processing and data analysis, a series of data products were delivered to the OECD. These included public use data files and codebooks, compendia
More informationAnalytical model A structure and process for analyzing a dataset. For example, a decision tree is a model for the classification of a dataset.
Glossary of data mining terms: Accuracy Accuracy is an important factor in assessing the success of data mining. When applied to data, accuracy refers to the rate of correct values in the data. When applied
More informationThe Proc Transpose Cookbook
ABSTRACT PharmaSUG 2017 - Paper TT13 The Proc Transpose Cookbook Douglas Zirbel, Wells Fargo and Co. Proc TRANSPOSE rearranges columns and rows of SAS datasets, but its documentation and behavior can be
More informationCREATING THE ANALYSIS
Chapter 14 Multiple Regression Chapter Table of Contents CREATING THE ANALYSIS...214 ModelInformation...217 SummaryofFit...217 AnalysisofVariance...217 TypeIIITests...218 ParameterEstimates...218 Residuals-by-PredictedPlot...219
More informationRSM Split-Plot Designs & Diagnostics Solve Real-World Problems
RSM Split-Plot Designs & Diagnostics Solve Real-World Problems Shari Kraber Pat Whitcomb Martin Bezener Stat-Ease, Inc. Stat-Ease, Inc. Stat-Ease, Inc. 221 E. Hennepin Ave. 221 E. Hennepin Ave. 221 E.
More informationECON Stata course, 3rd session
ECON4150 - Stata course, 3rd session Andrea Papini Heavily based on last year s session by Tarjei Havnes February 4, 2016 Stata course, 3rd session February 4, 2016 1 / 19 Before we start 1. Download caschool.dta
More information. UNDERSTANDING THE TRANSPOSE PROCEDURE
. UNDERSTANDING THE TRANSPOSE PROCEDURE Susan P. Repole ARC Professional Services Group There. are many times when a programmer will want to manipulate a SAS" data set by creating observations from variables
More informationShow how the LG-Syntax can be generated from a GUI model. Modify the LG-Equations to specify a different LC regression model
Tutorial #S1: Getting Started with LG-Syntax DemoData = 'conjoint.sav' This tutorial introduces the use of the LG-Syntax module, an add-on to the Advanced version of Latent GOLD. In this tutorial we utilize
More informationStrategies for Modeling Two Categorical Variables with Multiple Category Choices
003 Joint Statistical Meetings - Section on Survey Research Methods Strategies for Modeling Two Categorical Variables with Multiple Category Choices Christopher R. Bilder Department of Statistics, University
More informationChapter 6: Modifying and Combining Data Sets
Chapter 6: Modifying and Combining Data Sets The SET statement is a powerful statement in the DATA step. Its main use is to read in a previously created SAS data set which can be modified and saved as
More informationLecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010
Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010 Tim Hanson, Ph.D. University of South Carolina T. Hanson (USC) Stat 704: Data Analysis I, Fall 2010 1 / 26 Additive predictors
More informationBaruch College STA Senem Acet Coskun
Baruch College STA 9750 BOOK BUY A Predictive Mode Senem Acet Coskun Table of Contents Summary 3 Why this topic? 4 Data Sources 6 Variable Definitions 7 Descriptive Statistics 8 Univariate Analysis 9 Two-Sample
More informationPackage ridge. R topics documented: February 15, Title Ridge Regression with automatic selection of the penalty parameter. Version 2.
Package ridge February 15, 2013 Title Ridge Regression with automatic selection of the penalty parameter Version 2.1-2 Date 2012-25-09 Author Erika Cule Linear and logistic ridge regression for small data
More informationA New Method of Using Polytomous Independent Variables with Many Levels for the Binary Outcome of Big Data Analysis
Paper 2641-2015 A New Method of Using Polytomous Independent Variables with Many Levels for the Binary Outcome of Big Data Analysis ABSTRACT John Gao, ConstantContact; Jesse Harriott, ConstantContact;
More informationBayesian model selection and diagnostics
Bayesian model selection and diagnostics A typical Bayesian analysis compares a handful of models. Example 1: Consider the spline model for the motorcycle data, how many basis functions? Example 2: Consider
More informationPage 1 of 8. Language Development Study
Page 1 of 8 Language Development Study /* cread3.sas Read Castilla's language development data */ options nodate linesize=79 noovp formdlim=' '; title "Castilla's Language development Study"; proc format;
More informationBrief Guide on Using SPSS 10.0
Brief Guide on Using SPSS 10.0 (Use student data, 22 cases, studentp.dat in Dr. Chang s Data Directory Page) (Page address: http://www.cis.ysu.edu/~chang/stat/) I. Processing File and Data To open a new
More informationPharmaSUG China. model to include all potential prognostic factors and exploratory variables, 2) select covariates which are significant at
PharmaSUG China A Macro to Automatically Select Covariates from Prognostic Factors and Exploratory Factors for Multivariate Cox PH Model Yu Cheng, Eli Lilly and Company, Shanghai, China ABSTRACT Multivariate
More informationGeneralized Additive Models
Generalized Additive Models Statistics 135 Autumn 2005 Copyright c 2005 by Mark E. Irwin Generalized Additive Models GAMs are one approach to non-parametric regression in the multiple predictor setting.
More informationStatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.
StatCalc User Manual Version 9 for Mac and Windows Copyright 2018, AcaStat Software. All rights Reserved. http://www.acastat.com Table of Contents Introduction... 4 Getting Help... 4 Uninstalling StatCalc...
More informationUsing HLM for Presenting Meta Analysis Results. R, C, Gardner Department of Psychology
Data_Analysis.calm: dacmeta Using HLM for Presenting Meta Analysis Results R, C, Gardner Department of Psychology The primary purpose of meta analysis is to summarize the effect size results from a number
More informationSAS/STAT 14.3 User s Guide The SURVEYFREQ Procedure
SAS/STAT 14.3 User s Guide The SURVEYFREQ Procedure This document is an individual chapter from SAS/STAT 14.3 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute
More informationSimulating Multivariate Normal Data
Simulating Multivariate Normal Data You have a population correlation matrix and wish to simulate a set of data randomly sampled from a population with that structure. I shall present here code and examples
More informationWorkshop 8: Model selection
Workshop 8: Model selection Selecting among candidate models requires a criterion for evaluating and comparing models, and a strategy for searching the possibilities. In this workshop we will explore some
More informationExploratory model analysis
Exploratory model analysis with R and GGobi Hadley Wickham 6--8 Introduction Why do we build models? There are two basic reasons: explanation or prediction [Ripley, 4]. Using large ensembles of models
More information