An Application of PROC NLP to Survey Sample Weighting
|
|
- Patrick Russell
- 5 years ago
- Views:
Transcription
1 An Application of PROC NLP to Survey Sample Weighting Talbot Michael Katz, Analytic Data Information Technologies, New York, NY ABSTRACT The classic weighting formula for survey respondents compensates for differences between each cell s proportion of respondents, and its proportion of the target population. Such weighting also can be applied to cells based on variables of interest (beyond the experimental design). If even one cell has no responses, the entire weighting has to be reconsidered. An optimal reapportionment that attempts to preserve row / column marginals is proposed, with a PROC NLP implementation. Keywords: PROC NLP, nonlinear programming, weight adjustment, nonresponse. INTRODUCTION SAS software provides several tools for the design of surveys and the analysis of survey data. But even wellplanned and executed surveys can suffer from nonresponse. Both the PROC SURVEYMEANS and PROC SURVEYREG documentation for SAS/STAT software contain the following passage in their sections on missing values, Once data collection is complete, you can use imputation to replace missing values with acceptable values, and you can use sampling weight adjustments to compensate for nonresponse. You should complete this data preparation and adjustment before you analyze your data with PROC SURVEY[REG/MEANS]. [1] Several methods of weighting adjustment are already in use. One of the simplest methods multiplies each weight by the sum of base weights over all divided by the sum of base weights over responders [2]. Some of the more sophisticated methods use auxiliary data to build predictive models for probability of (non)response [3]. The appropriate weighting method to use may depend on the data available and the goals of the analysis. The method proposed here is useful for situations in which cells of interest are based upon levels of two or more variables, and there is a desire to maintain the marginal weights of the respondents as closely as possible in proportion to the marginal population sums. PROPORTIONAL WEIGHTING Suppose we start with a population of size P and take a sample of size S. Suppose that the population can be split into subgroups, P i, i = 1,,n, and the sample splits into corresponding subgroups S i. In a perfect world, S i / S = P i / P for each i then each sample subgroup has the correct proportion, and each individual in the sample can be given weight 1. In a still-sunny-but-slightly-less-than-perfect world each S i > 0 then each individual in group i can be assigned weight of (P i / P)*(S / S i ). These weights can be used in ANOVA or other modeling to extrapolate back to the original population. This is classic proportional weighting the sum of the individual weights adds up to P. Here is an easy example. Suppose the initial population P = 1000, P 1 = 400, P 2 = 300, P 3 = 200, P 4 = 100. Let S = 100, and S 1 = 20, S 2 = 10, S 3 = 20, S 4 = 50. Then w 1 = 2, w 2 = 3, w 3 = 1, w 4 = 0.2 intuitively, groups 1 and 2 are low in the sample, so their individuals weigh more than 1, group 4 is high in the sample, so its members weigh less than 1, group 3 has the same proportion of the sample as it does of the general population, so its members have unit weight. Proportional weighting breaks down if even a single cell is empty. Even if you decide to give the empty cell a weight of zero, the rest of the weighted individuals do not add up to the original population size. The easiest way to save proportional weighting in the presence of empty cells is to combine the empty cells with non-empty cells, if practical. Here is another easy example, with the same original population as the first example, but in this case S 1 = 20, S 2 = 30, S 3 = 0, S 4 = 50. If we can combine groups 3 and 4, then the weights are w 1 = 2, w 2 = 1, w 3-4 = 0.6. It is not always practical to combine cells, especially when the cells are created by values of two or more underlying variables (such as in a multifactorial design). 1
2 EMPTY CELLS CREATED BY TWO OR MORE VARIABLES Consider a population of 1000 workers who are classified in two ways, as employees (E) or contractors (C), and as SAS users (S) or the Unenlightened (U). Then these classifications can produce four subgroups, e.g., as follows: E C Total S U Total Then a sample of size 60 would get the following perfect weighting: E C Total S U Total What if one of the actual sample quadrants is zero? Suppose the bottom right quadrant, CU, is zero. Then, Combining CU and CS keeps the correct EC split, 36-24, but gives SU split of Combining CU and EU keeps the correct SU split, 42-18, but gives EC split of Combining CU and ES is hard to justify and gives EC split of 42-18, SU split of If the cells are proportionally reweighted, each cell would be multiplied by 60 / 54, giving cell ES weight of 26.67, EU weight of 13.33, CS weight of 20. This gives an EC split of 40-20, and SU split of , sort of a compromise between the first two combinations above. Another possible resolution would be to try to reweight each nonempty cell as close as possible to its proportionate value. This could be done by solving a least squares minimization. In the example above, we would minimize: (24 - ES) 2 + (18 - CS) 2 + (12 - EU) 2 subject to ES + CS + EU = 60. Substituting ES = 24 + d 1, CS = 18 + d 2, EU = 12 + d 3, this transforms to minimizing: (d 1 ) 2 + (d 2 ) 2 + (d 3 ) 2, subject to d 1 + d 2 + d 3 = 6. The solution to this is d 1 = d 2 = d 3 = 2, an even spread of the missing cell s weight to the other cells. This would result in an EC split of 40-20, and SU split of 46-14, a slightly better compromise than proportional reweighting in this case. The cell-by-cell least squares reapportionment example above generalizes to any number of missing cells, and the solution to getting the non-missing cells as close as possible to their proportionate values, in the least squares sense, is to spread the proportionate weight of the missing cells evenly among the remaining cells. This can be done in SAS without using anything as fancy as PROC NLP! Proportional reweighting and even spread as practiced above always affect all the non-empty cells. A more targeted approach would be to leave unmodified the cells that share no values of the underlying variables with the empty cells, and only reweight the guilty cells that share variable values with the empty cells. In our example, the CS and EU cells are guilty and the ES cell is not. Then, guilty proportional reweighting would multiply the two guilty cells by 36 / 30, giving an EC split of and SU split of Guilty even spread gives EC split of and SU split of The two guilty reapportionments are about equal to each other, and slightly better than the global reapportionments. In this example, there actually is a reapportionment solution that perfectly maintains the marginals, ES = 18, CS = 24, EU = 18. However, it does throw off the relative proportions of the individual cells more than the above solutions. Also, in some cases, there is no perfect solution to maintain the marginals. If ES were the empty quadrant in the sample, instead of CU, then a perfect marginal solution would have to satisfy: 2
3 CS + CU + EU = 60, CS + CU = 24, CU + EU = 18 this would require CU = -18, which is impossible. LEAST SQUARES MARGINALS MAINTENANCE Least Squares minimization can be applied to any combination of cells, not just the individual ones. In particular, Least Squares can be used to try to get the closest approximation to the individual variable marginal splits. For the ES = 0 problem in the previous example, minimize: (42 - CS) 2 + (36 - EU) 2 + (18 - (EU + CU)) 2 + (24 - (CS + CU)) 2, subject to EU + CU + CS = 60. The solution to this is CS = 33, EU = 27, CU = 0. Not very comforting, but this is a pretty extreme situation. When there is a missing cell in a 2x2 table, there usually will be a unique solution to the Least Squares problem for the splits on the two individual variables. For larger problems, there may not be a unique solution. For example, in the 2x2 case above with no missing cells, the Least Squares set-up for the SU, EC splits is to minimize: (42 - (CS + ES)) 2 + (36 - (EU + ES)) 2 + (18 - (EU + CU)) 2 + (24 - (CS + CU)) 2, subject to EU + ES + CU + CS = 60. The true values, ES = 24, CS = 18, EU = 12, CU = 6, solve this exactly, but so do ES = 20, CS = 22, EU = 16, CU = 2, and infinitely many other combinations. 2x2 cases can be done by hand, but what can handle more complex minimizations?... PROC NLP SAS has PROC NLP, a nonlinear optimizer, in the SAS/OR software module. It was introduced as an experimental release with SAS 6.08, and was placed into production with SAS 6.09, and has been included with each subsequent release of SAS/OR. The archetypal problem is least squares minimization with linear constraints (of which the sample reweighting problem is an example), but since release 6.11 nonlinear constraints are also allowed. Please note that SAS/IML software also has nonlinear programming capabilities, and PROC NLIN in SAS/STAT uses some of the same techniques. SAS 9 has completely revamped the SAS/OR tool set for release 9.2, and while PROC NLP remains available, the preferred method will be to use PROC NLPC or PROC OPTQP. Here is PROC NLP syntax for the simple 2x2 example above where ES = 0. PROC NLP OUTEST = nlpout1 TECHNIQUE = CONGRA --NOPRINT-- MIN objval PARMS cs = 20, cu = 20, eu = 20 objval = (42 - cs)**2 + (36 - eu)**2 + (18 - (eu + cu))**2 + (24 - (cs + cu))**2 BOUNDS cs cu eu >= 0 LINCON 60 = cs + cu + eu PROC NLP Syntax Notes: OUTEST contains the optimization solution, including optimal parameter values, objective function value, right hand sides of constraints. TECHNIQUE : several solution techniques are available, most (not all) requiring derivative info on objective function (user can supply this independently, like PROC NLIN, but doesn t always need to). CONGRA -- conjugate gradient, converges relatively easily. PARMS : initial parameter values for search. LINCON : linear constraints. BOUNDS : can have upper and lower bounds. Here is an alternative syntax for the same problem: 3
4 PROC NLP OUTEST = nlpout1 TECHNIQUE = CONGRA --NOPRINT-- LSQ fc fe fs fu PARMS cs = 20, cu = 20, eu = 20 fc = (24 - (cs + cu)) fe = (36 - eu) fs = (42 - cs) fu = (18 - (eu + cu)) BOUNDS cs cu eu >= 0 LINCON 60 = cs + cu + eu The key pieces of the PROC NLP set up are the target values, the parameter variables, and the initial parameter variable values. The target values (42, 18, 36, 24, in the example above) are the proportional pieces of the sample for the groups of interest -- in our case, separate groups for each individual trait variable level. There is one parameter variable for each non-empty Cartesian cell in the sample. The initial parameter values can be chosen in many ways one way is to start with the actual sample counts in each cell. To make this all useful, the task is to start with the initial data and go through the following steps : For both the population and sample, compute the total counts, the Cartesian cell counts, and the counts for each individual variable level. Use the counts to determine the number of NLP parameters, initial and target values, and generate the NLP step. Translate the results of the NLP step into weights, and merge back with the initial data. A SAMPLE MACRO FOR THE LEAST SQUARES MARGINAL MAINTENANCE REWEIGHTING This macro has several input parameters, including: inlibp population input data set library indsp population input data set name inlibs sample input data set library indss sample input data set name outlibs output data set library outdst trait value data set name outdss match-weights-to-sample data set name work for work library or other library to save intermediate data sets numtrait number of traits (variables) determining cells trait1, trait2, names of trait variables letter1, letter2 short names of trait variables numctr number of character trait variables (list them first) ncids total sample count wlb weight lower bound wub weight upper bound techneek optimization technique for PROC NLP wgtvar weight variable name * FIND POPULATION TRAIT VALUE PERCENTAGES (TARGETS OF REWEIGHTING SCHEME) %LET maxnval = 0 %* largest number of individual trait values %DO i = 1 %TO &numtrait. 4
5 PROC FREQ DATA = &inlibp..&indsp. NOPRINT TABLES &&trait&i. / OUT = &work..ptr&i. DATA _NULL_ SET &work..ptr&i. END = &last. RETAIN mintgt &ncids. * ncids is total sample count CALL SYMPUT("tv&i._" COMPRESS(_N_),COMPRESS(&&trait&i.)) * individual trait value target = PERCENT * &ncids. / 100 IF target < mintgt THEN DO mintgt = target CALL SYMPUT("tp&i._" COMPRESS(_N_),COMPRESS(target)) * target percentage IF &last. THEN DO CALL SYMPUT("nv&i.",COMPRESS(_N_)) * number of values of trait CALL SYMPUT("mintgt",COMPRESS(mintgt)) * minimum target value %IF &&nv&i. > &maxnval. %THEN %DO %LET maxnval = &&nv&i. % % %* trait i freq %LET ntnv = %SYSEVALF(&numtrait. * &maxnval.) %* upper bound on number of NLP parameters * FIND ALL CELLS REPRESENTED IN SAMPLE PROC FREQ DATA = &inlibs..&indss. NOPRINT TABLES &trait1. %DO i = 2 %TO &numtrait. * &&trait&i. % %* trait i / OUT = &work..smpcel1 * FIND WHICH VARIABLES GO WITH WHICH TRAIT VALUES (ONE VARIABLE PER UNIQUE CELL) DATA &outlibs..&outdst. set &work..smpcel1 END = &last. ARRAY vc{1:&numtrait.,1:&maxnval.} vc1 - vc&ntnv. * array to count number of variables which go with each trait value RETAIN vc1 - vc&ntnv. 0 DROP i j vc1 - vc&ntnv. wlbc wubc CALL SYMPUT("xi" COMPRESS(_N_),COMPRESS(COUNT)) * use actual cell counts as initial variable values wlbc = &wlb. * COUNT wubc = &wub. * COUNT CALL SYMPUT("wl" COMPRESS(_N_),COMPRESS(wlbc)) * to get proper lower bound on weight, have lower bound on cell variable be weight lower bound times cell count CALL SYMPUT("wu" COMPRESS(_N_),COMPRESS(wubc)) * to get proper upper bound on weight, have upper bound on cell variable be weight upper bound times cell count %DO i = 1 %TO &numtrait. %LET li = &&letter&i. SELECT (&&trait&i.) 5
6 %DO j = 1 %TO &&nv&i. WHEN %IF &i. LE &numctr. %THEN %DO ("&&&tv&i._&j.") % %* assume char variables listed first %ELSE %DO (&&&tv&i._&j.) % DO %* create list of vars with level j for trait i vc{&i.,&j.} + 1 CALL SYMPUT("x&li._&j._" COMPRESS(vc{&i.,&j.}),COMPRESS(_N_)) % %* j 1 to nvi OTHERWISE % %* i 1 to numtrait IF &last. THEN DO CALL SYMPUT("numxvar",COMPRESS(_N_)) * number of variables DO i = 1 TO &numtrait. DO j = 1 TO &maxnval. CALL SYMPUT("vc" COMPRESS(i) "_" COMPRESS(j),COMPRESS(vc{i,j})) * j * i * SET UP PROC NLP PROC NLP OUTEST = &work..nlpout1 NOPRINT TECHNIQUE = &techneek. MIN objval PARMS x1 = &xi1. %DO i = 2 %TO &numxvar., x&i. = &&xi&i. % %* i 2 to numxvar BOUNDS %LET numxvar1 = %SYSEVALF(&numxvar. - 1) %DO i = 1 %TO &numxvar1. &&wl&i. <= x&i. <= &&wu&i., % %* i 1 to numxvar1 &&wl&i. <= x&i. <= &&wu&i. %LET notfirst = 0 LINCON &ncids. = x1 %DO i = 2 %TO &numxvar. + x&i. % %* i 2 to numxvar objval = %DO i = 1 %TO &numtrait. %LET li = &&letter&i. %DO j = 1 %TO &&nv&i. %IF &&&vc&i._&j. %THEN %DO %* term irrelevant if no sample cells exist for it %IF ¬first. %THEN %DO %* plus sign to add successive terms after first + 6
7 % %ELSE %DO %LET notfirst = 1 % %LET wtij = &&trwt&i. &wtij. * (&&&tp&i._&j. - (x&&&x&li._&j._1. %DO k = 2 %TO &&&vc&i._&j. + x&&&x&li._&j._&k. % %* k 2 to vci_j ))**2 % %* vci_j > 0 % %* j 1 to nvi % %* i 1 to numtrait * EXTRACT SOLUTION FOR MATCHING WITH TRANSLATION SET PROC TRANSPOSE DATA = &work..nlpout1 (DROP = _NAME_ WHERE = (_TYPE_ = "PARMS")) OUT = &work..nlparms1 VAR x1 - x&numxvar. * MATCH SOLUTION WITH TRANSLATION SET AND SOLVE FOR WEIGHTS DATA &outlibs..&outdst. MERGE &work..nlparms1 &outlibs..&outdst. * merge had better be one to one DROP COL1 wlb wlbc wub wubc wlb = &wlb. wlbc = &wlb. * COUNT wub = &wub. wubc = &wub. * COUNT IF COL1 < wlbc THEN DO &wgtvar. = &wlb. ELSE IF COL1 > wubc THEN DO &wgtvar. = &wub. ELSE DO &wgtvar. = COL1 / COUNT * SORT SAMPLE TO MATCH WITH TRANSLATION SET AND APPLY WEIGHTS PROC SORT DATA = &inlibs..&indss. OUT = &work..indsrt1 BY %DO i = 1 %TO &numtrait. &&trait&i. % %* i 1 to numtrait * MATCH WEIGHTS TO SAMPLE DATA &outlibs..&outdss. MERGE &work..indsrt1 (IN = ins) &outlibs..&outdst. (IN = ino KEEP = &wgtvar. %DO i = 1 %TO &numtrait. 7
8 &&trait&i. % %* i 1 to numtrait ) END = &last. BY %DO i = 1 %TO &numtrait. &&trait&i. % %* i 1 to numtrait DROP ctm ctn cts cto IF ins THEN DO IF ino THEN DO ctm + 1 OUTPUT ELSE DO cts + 1 * should be zero ELSE IF ino THEN DO cto + 1 * should be zero ELSE DO ctn + 1 * must be zero IF &last. THEN DO PUT "ctm = " ctm PUT "cts = " cts PUT "cto = " cto PUT "ctn = " ctn * * * * * * * * * * * * * * * 8
9 CONCLUSIONS We have seen that reweighting to handle empty cells confronts us with many possible choices different ones may be desirable depending upon circumstances. Many of the reweighting schemes are easy to apply. We showed that under many conditions, it is possible for a more sophisticated reweighting scheme to preserve the marginal distribution. This involves minimizing a quadratic objective function, and may best be accomplished with the assistance of nonlinear optimization software, such as the PROC NLP procedure of SAS/OR. REFERENCES: [1] [2] Department of Energy 1995 Commercial Buildings Energy Consumption Survey [3] Weighting Adjustments for Unit Nonresponse with Multiple Outcome Variables S.L. Vartivarian and R. Little, 2003, University of Michigan Department of Biostatistics Working Paper Series ACKNOWLEDGMENTS SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Talbot Michael Katz Analytic Data Information Technologies 229 East 21 st Street, #2 New York NY Phone: Fax: topkatz@msn.com * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 9
Using PROC REPORT to Cross-Tabulate Multiple Response Items Patrick Thornton, SRI International, Menlo Park, CA
Using PROC REPORT to Cross-Tabulate Multiple Response Items Patrick Thornton, SRI International, Menlo Park, CA ABSTRACT This paper describes for an intermediate SAS user the use of PROC REPORT to create
More informationAnalysis of Complex Survey Data with SAS
ABSTRACT Analysis of Complex Survey Data with SAS Christine R. Wells, Ph.D., UCLA, Los Angeles, CA The differences between data collected via a complex sampling design and data collected via other methods
More informationCREATING A SUMMARY TABLE OF NORMALIZED (Z) SCORES
CREATING A SUMMARY TABLE OF NORMALIZED (Z) SCORES Walter W. OWen The Biostatistics Center The George Washington University ABSTRACT Data from the behavioral sciences are often analyzed by normalizing the
More informationA Side of Hash for You To Dig Into
A Side of Hash for You To Dig Into Shan Ali Rasul, Indigo Books & Music Inc, Toronto, Ontario, Canada. ABSTRACT Within the realm of Customer Relationship Management (CRM) there is always a need for segmenting
More informationThe Basics of PROC FCMP. Dachao Liu Northwestern Universtiy Chicago
The Basics of PROC FCMP Dachao Liu Northwestern Universtiy Chicago ABSTRACT SAS Functions can save SAS users time and effort in programming. Each release of SAS has new functions added. Up to the latest
More informationChapter 6: Modifying and Combining Data Sets
Chapter 6: Modifying and Combining Data Sets The SET statement is a powerful statement in the DATA step. Its main use is to read in a previously created SAS data set which can be modified and saved as
More informationSo Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines
Paper TT13 So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines Anthony Harris, PPD, Wilmington, NC Robby Diseker, PPD, Wilmington, NC ABSTRACT
More information- 1 - Fig. A5.1 Missing value analysis dialog box
WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation
More informationUsing Templates Created by the SAS/STAT Procedures
Paper 081-29 Using Templates Created by the SAS/STAT Procedures Yanhong Huang, Ph.D. UMDNJ, Newark, NJ Jianming He, Solucient, LLC., Berkeley Heights, NJ ABSTRACT SAS procedures provide a large quantity
More informationThe Dataset Diet How to transform short and fat into long and thin
Paper TU06 The Dataset Diet How to transform short and fat into long and thin Kathryn Wright, Oxford Pharmaceutical Sciences, UK ABSTRACT What do you do when you are given a dataset with one observation
More informationAre you Still Afraid of Using Arrays? Let s Explore their Advantages
Paper CT07 Are you Still Afraid of Using Arrays? Let s Explore their Advantages Vladyslav Khudov, Experis Clinical, Kharkiv, Ukraine ABSTRACT At first glance, arrays in SAS seem to be a complicated and
More informationUncommon Techniques for Common Variables
Paper 11863-2016 Uncommon Techniques for Common Variables Christopher J. Bost, MDRC, New York, NY ABSTRACT If a variable occurs in more than one data set being merged, the last value (from the variable
More informationTwo useful macros to nudge SAS to serve you
Two useful macros to nudge SAS to serve you David Izrael, Michael P. Battaglia, Abt Associates Inc., Cambridge, MA Abstract This paper offers two macros that augment the power of two SAS procedures: LOGISTIC
More informationA Quick and Gentle Introduction to PROC SQL
ABSTRACT Paper B2B 9 A Quick and Gentle Introduction to PROC SQL Shane Rosanbalm, Rho, Inc. Sam Gillett, Rho, Inc. If you are afraid of SQL, it is most likely because you haven t been properly introduced.
More informationAnyone Can Learn PROC TABULATE, v2.0
Paper 63-25 Anyone Can Learn PROC TABULATE, v2.0 Lauren Haworth Ischemia Research & Education Foundation San Francisco ABSTRACT SAS Software provides hundreds of ways you can analyze your data. You can
More informationUsing the CLP Procedure to solve the agent-district assignment problem
Using the CLP Procedure to solve the agent-district assignment problem Kevin K. Gillette and Stephen B. Sloan, Accenture ABSTRACT The Challenge: assigning outbound calling agents in a telemarketing campaign
More informationA SAS Solution to Create a Weekly Format Susan Bakken, Aimia, Plymouth, MN
Paper S126-2012 A SAS Solution to Create a Weekly Format Susan Bakken, Aimia, Plymouth, MN ABSTRACT As programmers, we are frequently asked to report by periods that do not necessarily correspond to weeks
More informationKnow What You Are Missing: How to Catalogue and Manage Missing Pieces of Historical Data
Know What You Are Missing: How to Catalogue and Manage Missing Pieces of Historical Data Shankar Yaddanapudi, SAS Consultant, Washington DC ABSTRACT In certain applications it is necessary to maintain
More informationRanking Between the Lines
Ranking Between the Lines A %MACRO for Interpolated Medians By Joe Lorenz SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in
More informationData Quality Control: Using High Performance Binning to Prevent Information Loss
SESUG Paper DM-173-2017 Data Quality Control: Using High Performance Binning to Prevent Information Loss ABSTRACT Deanna N Schreiber-Gregory, Henry M Jackson Foundation It is a well-known fact that the
More informationAutomating Preliminary Data Cleaning in SAS
Paper PO63 Automating Preliminary Data Cleaning in SAS Alec Zhixiao Lin, Loan Depot, Foothill Ranch, CA ABSTRACT Preliminary data cleaning or scrubbing tries to delete the following types of variables
More informationUsing PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO
Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO ABSTRACT The power of SAS programming can at times be greatly improved using PROC SQL statements for formatting and manipulating
More informationUsing PROC SQL to Generate Shift Tables More Efficiently
ABSTRACT SESUG Paper 218-2018 Using PROC SQL to Generate Shift Tables More Efficiently Jenna Cody, IQVIA Shift tables display the change in the frequency of subjects across specified categories from baseline
More informationStatistics, Data Analysis & Econometrics
ST009 PROC MI as the Basis for a Macro for the Study of Patterns of Missing Data Carl E. Pierchala, National Highway Traffic Safety Administration, Washington ABSTRACT The study of missing data patterns
More informationIt s Not All Relative: SAS/Graph Annotate Coordinate Systems
Paper TU05 It s Not All Relative: SAS/Graph Annotate Coordinate Systems Rick Edwards, PPD Inc, Wilmington, NC ABSTRACT This paper discusses the SAS/Graph Annotation coordinate systems and how a combination
More informationData Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski
Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...
More informationSquare Peg, Square Hole Getting Tables to Fit on Slides in the ODS Destination for PowerPoint
PharmaSUG 2018 - Paper DV-01 Square Peg, Square Hole Getting Tables to Fit on Slides in the ODS Destination for PowerPoint Jane Eslinger, SAS Institute Inc. ABSTRACT An output table is a square. A slide
More informationMapping Clinical Data to a Standard Structure: A Table Driven Approach
ABSTRACT Paper AD15 Mapping Clinical Data to a Standard Structure: A Table Driven Approach Nancy Brucken, i3 Statprobe, Ann Arbor, MI Paul Slagle, i3 Statprobe, Ann Arbor, MI Clinical Research Organizations
More informationCMISS the SAS Function You May Have Been MISSING Mira Shapiro, Analytic Designers LLC, Bethesda, MD
ABSTRACT SESUG 2016 - RV-201 CMISS the SAS Function You May Have Been MISSING Mira Shapiro, Analytic Designers LLC, Bethesda, MD Those of us who have been using SAS for more than a few years often rely
More informationSUGI 29 Statistics and Data Analysis. To Rake or Not To Rake Is Not the Question Anymore with the Enhanced Raking Macro
Paper 7-9 To Rake or Not To Rake Is Not the Question Anymore with the Enhanced Raking Macro David Izrael, David C. Hoaglin, and Michael P. Battaglia Abt Associates Inc., Cambridge, Massachusetts Abstract
More informationIt s Proc Tabulate Jim, but not as we know it!
Paper SS02 It s Proc Tabulate Jim, but not as we know it! Robert Walls, PPD, Bellshill, UK ABSTRACT PROC TABULATE has received a very bad press in the last few years. Most SAS Users have come to look on
More informationSAS Macro Technique for Embedding and Using Metadata in Web Pages. DataCeutics, Inc., Pottstown, PA
Paper AD11 SAS Macro Technique for Embedding and Using Metadata in Web Pages Paul Gilbert, Troy A. Ruth, Gregory T. Weber DataCeutics, Inc., Pottstown, PA ABSTRACT This paper will present a technique to
More informationOptimization and least squares. Prof. Noah Snavely CS1114
Optimization and least squares Prof. Noah Snavely CS1114 http://cs1114.cs.cornell.edu Administrivia A5 Part 1 due tomorrow by 5pm (please sign up for a demo slot) Part 2 will be due in two weeks (4/17)
More informationMicrosoft Access XP (2002) - Advanced Queries
Microsoft Access XP (2002) - Advanced Queries Group/Summary Operations Change Join Properties Not Equal Query Parameter Queries Working with Text IIF Queries Expression Builder Backing up Tables Action
More informationTable of Contents (As covered from textbook)
Table of Contents (As covered from textbook) Ch 1 Data and Decisions Ch 2 Displaying and Describing Categorical Data Ch 3 Displaying and Describing Quantitative Data Ch 4 Correlation and Linear Regression
More informationBACKGROUND INFORMATION ON COMPLEX SAMPLE SURVEYS
Analysis of Complex Sample Survey Data Using the SURVEY PROCEDURES and Macro Coding Patricia A. Berglund, Institute For Social Research-University of Michigan, Ann Arbor, Michigan ABSTRACT The paper presents
More informationA SAS Macro for Balancing a Weighted Sample
Paper 258-25 A SAS Macro for Balancing a Weighted Sample David Izrael, David C. Hoaglin, and Michael P. Battaglia Abt Associates Inc., Cambridge, Massachusetts Abstract It is often desirable to adjust
More informationCreating an ADaM Data Set for Correlation Analyses
PharmaSUG 2018 - Paper DS-17 ABSTRACT Creating an ADaM Data Set for Correlation Analyses Chad Melson, Experis Clinical, Cincinnati, OH The purpose of a correlation analysis is to evaluate relationships
More informationTable Lookups: Getting Started With Proc Format
Table Lookups: Getting Started With Proc Format John Cohen, AstraZeneca LP, Wilmington, DE ABSTRACT Table lookups are among the coolest tricks you can add to your SAS toolkit. Unfortunately, these techniques
More informationSubmitting SAS Code On The Side
ABSTRACT PharmaSUG 2013 - Paper AD24-SAS Submitting SAS Code On The Side Rick Langston, SAS Institute Inc., Cary NC This paper explains the new DOSUBL function and how it can submit SAS code to run "on
More informationDSCI 325: Handout 10 Summarizing Numerical and Categorical Data in SAS Spring 2017
DSCI 325: Handout 10 Summarizing Numerical and Categorical Data in SAS Spring 2017 USING PROC MEANS The routine PROC MEANS can be used to obtain limited summaries for numerical variables (e.g., the mean,
More information%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma
Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma ABSTRACT Today there is more pressure on programmers to deliver summary outputs faster without sacrificing quality. By using just a few programming
More informationMISSING DATA AND MULTIPLE IMPUTATION
Paper 21-2010 An Introduction to Multiple Imputation of Complex Sample Data using SAS v9.2 Patricia A. Berglund, Institute For Social Research-University of Michigan, Ann Arbor, Michigan ABSTRACT This
More informationPREREQUISITES FOR EXAMPLES
212-2007 SAS Information Map Studio and SAS Web Report Studio A Tutorial Angela Hall, Zencos Consulting LLC, Durham, NC Brian Miles, Zencos Consulting LLC, Durham, NC ABSTRACT Find out how to provide the
More informationUsing SAS to Manage Biological Species Data and Calculate Diversity Indices
SCSUG November 2014 Using SAS to Manage Biological Species Data and Calculate Diversity Indices ABSTRACT Paul A. Montagna, Harte Research Institute, TAMU-CC, Corpus Christi, TX Species level information
More informationUsing SAS Macros to Extract P-values from PROC FREQ
SESUG 2016 ABSTRACT Paper CC-232 Using SAS Macros to Extract P-values from PROC FREQ Rachel Straney, University of Central Florida This paper shows how to leverage the SAS Macro Facility with PROC FREQ
More informationGreenspace: A Macro to Improve a SAS Data Set Footprint
Paper AD-150 Greenspace: A Macro to Improve a SAS Data Set Footprint Brian Varney, Experis Business Intelligence and Analytics Practice ABSTRACT SAS programs can be very I/O intensive. SAS data sets with
More informationUsing SAS/SCL to Create Flexible Programs... A Super-Sized Macro Ellen Michaliszyn, College of American Pathologists, Northfield, IL
Using SAS/SCL to Create Flexible Programs... A Super-Sized Macro Ellen Michaliszyn, College of American Pathologists, Northfield, IL ABSTRACT SAS is a powerful programming language. When you find yourself
More informationThere s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA
Paper HW04 There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA ABSTRACT Clinical Trials data comes in all shapes and sizes depending
More informationPhUse Practical Uses of the DOW Loop in Pharmaceutical Programming Richard Read Allen, Peak Statistical Services, Evergreen, CO, USA
PhUse 2009 Paper Tu01 Practical Uses of the DOW Loop in Pharmaceutical Programming Richard Read Allen, Peak Statistical Services, Evergreen, CO, USA ABSTRACT The DOW-Loop was originally developed by Don
More informationThe Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data
Paper PO31 The Power of PROC SQL Techniques and SAS Dictionary Tables in Handling Data MaryAnne DePesquo Hope, Health Services Advisory Group, Phoenix, Arizona Fen Fen Li, Health Services Advisory Group,
More informationBY S NOTSORTED OPTION Karuna Samudral, Octagon Research Solutions, Inc., Wayne, PA Gregory M. Giddings, Centocor R&D Inc.
ABSTRACT BY S NOTSORTED OPTION Karuna Samudral, Octagon Research Solutions, Inc., Wayne, PA Gregory M. Giddings, Centocor R&D Inc., Malvern, PA What if the usual sort and usual group processing would eliminate
More informationCleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA
Cleaning Duplicate Observations on a Chessboard of Missing Values Mayrita Vitvitska, ClinOps, LLC, San Francisco, CA ABSTRACT Removing duplicate observations from a data set is not as easy as it might
More informationAn Algorithm to Compute Exact Power of an Unordered RxC Contingency Table
NESUG 27 An Algorithm to Compute Eact Power of an Unordered RC Contingency Table Vivek Pradhan, Cytel Inc., Cambridge, MA Stian Lydersen, Department of Cancer Research and Molecular Medicine, Norwegian
More information(Refer Slide Time 04:53)
Programming and Data Structure Dr.P.P.Chakraborty Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture 26 Algorithm Design -1 Having made a preliminary study
More informationfootnote1 height=8pt j=l "(Rev. &sysdate)" j=c "{\b\ Page}{\field{\*\fldinst {\b\i PAGE}}}";
Producing an Automated Data Dictionary as an RTF File (or a Topic to Bring Up at a Party If You Want to Be Left Alone) Cyndi Williamson, SRI International, Menlo Park, CA ABSTRACT Data dictionaries are
More informationWhat s New in SAS Studio?
ABSTRACT Paper SAS1832-2015 What s New in SAS Studio? Mike Porter, Amy Peters, and Michael Monaco, SAS Institute Inc., Cary, NC If you have not had a chance to explore SAS Studio yet, or if you re anxious
More informationSAS/STAT 14.2 User s Guide. The SURVEYIMPUTE Procedure
SAS/STAT 14.2 User s Guide The SURVEYIMPUTE Procedure This document is an individual chapter from SAS/STAT 14.2 User s Guide. The correct bibliographic citation for this manual is as follows: SAS Institute
More informationround decimals to the nearest decimal place and order negative numbers in context
6 Numbers and the number system understand and use proportionality use the equivalence of fractions, decimals and percentages to compare proportions use understanding of place value to multiply and divide
More informationThe new SAS 9.2 FCMP Procedure, what functions are in your future? John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc.
PharmaSUG2010 - Paper AD02 The new SAS 9.2 FCMP Procedure, what functions are in your future? John H. Adams, Boehringer Ingelheim Pharmaceutical, Inc., Ridgefield, CT ABSTRACT Our company recently decided
More informationStatistics Case Study 2000 M. J. Clancy and M. C. Linn
Statistics Case Study 2000 M. J. Clancy and M. C. Linn Problem Write and test functions to compute the following statistics for a nonempty list of numeric values: The mean, or average value, is computed
More informationSimulation of Imputation Effects Under Different Assumptions. Danny Rithy
Simulation of Imputation Effects Under Different Assumptions Danny Rithy ABSTRACT Missing data is something that we cannot always prevent. Data can be missing due to subjects' refusing to answer a sensitive
More informationnumber Understand the equivalence between recurring decimals and fractions
number Understand the equivalence between recurring decimals and fractions Using and Applying Algebra Calculating Shape, Space and Measure Handling Data Use fractions or percentages to solve problems involving
More informationChecking for Duplicates Wendi L. Wright
Checking for Duplicates Wendi L. Wright ABSTRACT This introductory level paper demonstrates a quick way to find duplicates in a dataset (with both simple and complex keys). It discusses what to do when
More informationThe Piecewise Regression Model as a Response Modeling Tool
NESUG 7 The Piecewise Regression Model as a Response Modeling Tool Eugene Brusilovskiy University of Pennsylvania Philadelphia, PA Abstract The general problem in response modeling is to identify a response
More informationPaper PS05_05 Using SAS to Process Repeated Measures Data Terry Fain, RAND Corporation Cyndie Gareleck, RAND Corporation
Paper PS05_05 Using SAS to Process Repeated Measures Data Terry Fain, RAND Corporation Cyndie Gareleck, RAND Corporation ABSTRACT Data that contain multiple observations per case are called repeated measures
More informationInterleaving a Dataset with Itself: How and Why
cc002 Interleaving a Dataset with Itself: How and Why Howard Schreier, U.S. Dept. of Commerce, Washington DC ABSTRACT When two or more SAS datasets are combined by means of a SET statement and an accompanying
More informationFeature Extractors. CS 188: Artificial Intelligence Fall Nearest-Neighbor Classification. The Perceptron Update Rule.
CS 188: Artificial Intelligence Fall 2007 Lecture 26: Kernels 11/29/2007 Dan Klein UC Berkeley Feature Extractors A feature extractor maps inputs to feature vectors Dear Sir. First, I must solicit your
More informationMacros for Two-Sample Hypothesis Tests Jinson J. Erinjeri, D.K. Shifflet and Associates Ltd., McLean, VA
Paper CC-20 Macros for Two-Sample Hypothesis Tests Jinson J. Erinjeri, D.K. Shifflet and Associates Ltd., McLean, VA ABSTRACT Statistical Hypothesis Testing is performed to determine whether enough statistical
More informationHow to Go From SAS Data Sets to DATA NULL or WordPerfect Tables Anne Horney, Cooperative Studies Program Coordinating Center, Perry Point, Maryland
How to Go From SAS Data Sets to DATA NULL or WordPerfect Tables Anne Horney, Cooperative Studies Program Coordinating Center, Perry Point, Maryland ABSTRACT Clinical trials data reports often contain many
More information%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System
%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System Rushi Patel, Creative Information Technology, Inc., Arlington, VA ABSTRACT It is common to find
More informationData Quality Control for Big Data: Preventing Information Loss With High Performance Binning
Data Quality Control for Big Data: Preventing Information Loss With High Performance Binning ABSTRACT Deanna Naomi Schreiber-Gregory, Henry M Jackson Foundation, Bethesda, MD It is a well-known fact that
More information... ) city (city, cntyid, area, pop,.. )
PaperP829 PROC SQl - Is it a Required Tool for Good SAS Programming? Ian Whitlock, Westat Abstract No one SAS tool can be the answer to all problems. However, it should be hard to consider a SAS programmer
More informationProducing Summary Tables in SAS Enterprise Guide
Producing Summary Tables in SAS Enterprise Guide Lora D. Delwiche, University of California, Davis, CA Susan J. Slaughter, Avocet Solutions, Davis, CA ABSTRACT This paper shows, step-by-step, how to use
More informationThe SAS/OR s OPTMODEL Procedure :
The SAS/OR s OPTMODEL Procedure : A Powerful Modeling Environment for Building, Solving, and Maintaining Mathematical Optimization Models Maurice Djona OASUS - Wednesday, November 19 th, 2008 Agenda Context:
More informationThe REPORT Procedure: A Primer for the Compute Block
Paper TT15-SAS The REPORT Procedure: A Primer for the Compute Block Jane Eslinger, SAS Institute Inc. ABSTRACT It is well-known in the world of SAS programming that the REPORT procedure is one of the best
More informationLarge Margin Classification Using the Perceptron Algorithm
Large Margin Classification Using the Perceptron Algorithm Yoav Freund Robert E. Schapire Presented by Amit Bose March 23, 2006 Goals of the Paper Enhance Rosenblatt s Perceptron algorithm so that it can
More informationPaper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.
Paper 76-28 Comparative Efficiency of SQL and Base Code When Reading from Database Tables and Existing Data Sets Steven Feder, Federal Reserve Board, Washington, D.C. ABSTRACT In this paper we compare
More informationLet's Play a Game: A SAS Program for Creating a Word Search Matrix Robert S. Matthews, University of Alabama at Birmingham, Birmingham, AL
SESUG 2012 Paper CT-18 Let's Play a Game: A SAS Program for Creating a Word Search Matrix Robert S. Matthews, University of Alabama at Birmingham, Birmingham, AL ABSTRACT This paper describes a process
More informationMath Lab- Geometry Pacing Guide Quarter 3. Unit 1: Rational and Irrational Numbers, Exponents and Roots
1 Jan. 3-6 (4 days) 2 Jan. 9-13 Unit 1: Rational and Irrational Numbers, Exponents and Roots ISTEP+ ISTEP Framework Focus: Unit 1 Number Sense, Expressions, and Computation 8.NS.1: Give examples of rational
More informationWhich of the following toolbar buttons would you use to find the sum of a group of selected cells?
Which of the following toolbar buttons would you use to find the sum of a group of selected cells? Selecting a group of cells and clicking on Set Print Area as shown in the figure below has what effect?
More informationTweaking your tables: Suppressing superfluous subtotals in PROC TABULATE
ABSTRACT Tweaking your tables: Suppressing superfluous subtotals in PROC TABULATE Steve Cavill, NSW Bureau of Crime Statistics and Research, Sydney, Australia PROC TABULATE is a great tool for generating
More informationCHAPTER 4: MICROSOFT OFFICE: EXCEL 2010
CHAPTER 4: MICROSOFT OFFICE: EXCEL 2010 Quick Summary A workbook an Excel document that stores data contains one or more pages called a worksheet. A worksheet or spreadsheet is stored in a workbook, and
More informationUnlock SAS Code Automation with the Power of Macros
SESUG 2015 ABSTRACT Paper AD-87 Unlock SAS Code Automation with the Power of Macros William Gui Zupko II, Federal Law Enforcement Training Centers SAS code, like any computer programming code, seems to
More informationPharmaSUG Paper AD06
PharmaSUG 2012 - Paper AD06 A SAS Tool to Allocate and Randomize Samples to Illumina Microarray Chips Huanying Qin, Baylor Institute of Immunology Research, Dallas, TX Greg Stanek, STEEEP Analytics, Baylor
More informationLab #9: ANOVA and TUKEY tests
Lab #9: ANOVA and TUKEY tests Objectives: 1. Column manipulation in SAS 2. Analysis of variance 3. Tukey test 4. Least Significant Difference test 5. Analysis of variance with PROC GLM 6. Levene test for
More informationMonte Carlo Integration
Lab 18 Monte Carlo Integration Lab Objective: Implement Monte Carlo integration to estimate integrals. Use Monte Carlo Integration to calculate the integral of the joint normal distribution. Some multivariable
More informationHelping You C What You Can Do with SAS
ABSTRACT Paper SAS1747-2015 Helping You C What You Can Do with SAS Andrew Henrick, Donald Erdman, and Karen Croft, SAS Institute Inc., Cary, NC SAS users are already familiar with the FCMP procedure and
More information2015 Vanderbilt University
Excel Supplement 2015 Vanderbilt University Introduction This guide describes how to perform some basic data manipulation tasks in Microsoft Excel. Excel is spreadsheet software that is used to store information
More informationSAS/STAT 13.1 User s Guide. The Power and Sample Size Application
SAS/STAT 13.1 User s Guide The Power and Sample Size Application This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as
More informationComparison of different ways using table lookups on huge tables
PhUSE 007 Paper CS0 Comparison of different ways using table lookups on huge tables Ralf Minkenberg, Boehringer Ingelheim Pharma GmbH & Co. KG, Ingelheim, Germany ABSTRACT In many application areas the
More informationABSTRACT INTRODUCTION TRICK 1: CHOOSE THE BEST METHOD TO CREATE MACRO VARIABLES
An Efficient Method to Create a Large and Comprehensive Codebook Wen Song, ICF International, Calverton, MD Kamya Khanna, ICF International, Calverton, MD Baibai Chen, ICF International, Calverton, MD
More informationFormat-o-matic: Using Formats To Merge Data From Multiple Sources
SESUG Paper 134-2017 Format-o-matic: Using Formats To Merge Data From Multiple Sources Marcus Maher, Ipsos Public Affairs; Joe Matise, NORC at the University of Chicago ABSTRACT User-defined formats are
More informationSAS IT Resource Management Forecasting. Setup Specification Document. A SAS White Paper
SAS IT Resource Management Forecasting Setup Specification Document A SAS White Paper Table of Contents Introduction to SAS IT Resource Management Forecasting... 1 Getting Started with the SAS Enterprise
More informationWeighting and estimation for the EU-SILC rotational design
Weighting and estimation for the EUSILC rotational design JeanMarc Museux 1 (Provisional version) 1. THE EUSILC INSTRUMENT 1.1. Introduction In order to meet both the crosssectional and longitudinal requirements,
More information2 = Disagree 3 = Neutral 4 = Agree 5 = Strongly Agree. Disagree
PharmaSUG 2012 - Paper HO01 Multiple Techniques for Scoring Quality of Life Questionnaires Brandon Welch, Rho, Inc., Chapel Hill, NC Seungshin Rhee, Rho, Inc., Chapel Hill, NC ABSTRACT In the clinical
More informationBI-09 Using Enterprise Guide Effectively Tom Miron, Systems Seminar Consultants, Madison, WI
Paper BI09-2012 BI-09 Using Enterprise Guide Effectively Tom Miron, Systems Seminar Consultants, Madison, WI ABSTRACT Enterprise Guide is not just a fancy program editor! EG offers a whole new window onto
More informationJourney to the center of the earth Deep understanding of SAS language processing mechanism Di Chen, SAS Beijing R&D, Beijing, China
Journey to the center of the earth Deep understanding of SAS language processing Di Chen, SAS Beijing R&D, Beijing, China ABSTRACT SAS is a highly flexible and extensible programming language, and a rich
More informationNon-trivial extraction of implicit, previously unknown and potentially useful information from data
CS 795/895 Applied Visual Analytics Spring 2013 Data Mining Dr. Michele C. Weigle http://www.cs.odu.edu/~mweigle/cs795-s13/ What is Data Mining? Many Definitions Non-trivial extraction of implicit, previously
More informationSAS/STAT 13.1 User s Guide. The SURVEYFREQ Procedure
SAS/STAT 13.1 User s Guide The SURVEYFREQ Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS
More information