Using Taylor s Linearization Technique in StEPS to Estimate Variances for Non-Linear Survey Estimators 1

Size: px
Start display at page:

Download "Using Taylor s Linearization Technique in StEPS to Estimate Variances for Non-Linear Survey Estimators 1"

Transcription

1 Using Taylor s Linearization Technique in StEPS to Estimate Variances for Non-Linear Survey Estimators Roger L. Goodwin, U.S. Bureau of the Census, Washington, DC 033 Katherine J. Thompson, U.S. Bureau of the Census, Washington, DC 033 Abstract: Estimating variances of non-linear functions involving two or more random variables is a challenging problem in survey sample variance estimation. A common approach to solving this problem is to linearize such functions using Taylor series methods, then estimate the variance of the linearized function using () a vector of first-order derivatives evaluated at the point estimates and () the variance-covariance matrix of all the function variables. This document describes the SAS macro used to implement the Taylor s Linearization Technique for variance estimation in the U.S. Census Bureau s Standard Economic Processing System (StEPS). All StEPS input parameters and calculated estimates are stored in SAS data sets using standard file formats. There are several implementation considerations associated with using these standard files, which are discussed in detail in the paper. Our macro uses BASE/SAS to evaluate the derivatives at the point estimates, then builds the variance-covariance matrix in PROC IML. The evaluated derivatives are read into PROC IML, from which we obtain the variance estimates using simple matrix multiplication. This paper is intended for people with an interest in BASE/SAS, SAS macros and PROC IML for calculating variance estimates. KEYWORDS: variance estimation, non-linear functions, SAS macro, SAS data steps, PROC IML Background on Taylor s Linearization Technique: Let f be a non-linear function of two or more random variables, d be a column vector of derivatives of f evaluated at point estimates of the means of the function variables, and S be the variance-covariance matrix of all the function variables. The Taylor Linearization Technique approximates the variance of f evaluated over the function variables with the expression d` S d. (Wolter, 985 and Sarndal, Swensson, Wretman, 99). If f is the ratio of two random variables, then there is a simple expression for the Taylor linearized variance, namely VAR() VAR(Y) COV(, Y) VAR( Y ) ( Y ) + Y Y. This formula is hard-coded into %TAYLOR; all other non-linear functions require derivative formulas along with the associated point estimates, standard errors, and covariances. Ratio estimates are the most common type of non-linear estimator published by the Census Bureau s economic surveys. Hard-coding the formula for ratio estimates reduces implementor burden since the user does not have to key-in (and verify) expressions for several sets of derivatives and also decreases the potential for specification errors. What Is StEPS? The Standardized Economic Processing System (StEPS) is a generalized survey processing system used in the Economic Directorate of the U.S. Census Bureau to process over 00 current economic surveys (Tasky and Ahmed, 999). It is written entirely in SAS and operates in a UNI environment. StEPS contains integrated modules for data-collection support, editing, data review and correction, imputation, calculation of estimates and variances, and system administration. The estimation and variance module consists of a set of SAS macros, each of which performs a specific estimation function (Sigman, 000). StEPS users control estimation via scripts, which are SAS programs that invoke existing StEPS estimates and variances macros in a user-specified sequence. %TAYLOR is one of the StEPS estimates and variances macros. StEPS stores macro-data in estimation results files (ERFs). See Figure. One ERF corresponds to one table, which is the result of StEPS performing calculations on analysis variables for individual values of categorical BY variables. An ERF contains bookkeeping variables such as the date and time of last modification, the name of the program that made the last modification, survey name, This paper reports the results of research and analysis undertaken by Census Bureau staff. It has undergone a Census Bureau review more limited in scope than that given to official Census Bureau publications. This report is released to inform interested parties of ongoing research and to encourage discussion of work in progress.

2 statistical period, etc. In addition to the bookkeeping variables, an ERF contains these key data fields: ITEM and ITEM: the names of the analysis variable(s). For example, a total would only need to use the ITEM column. A covariance would need to use both the ITEM and ITEM columns. BY, BY, : every combination of every level of the BY variables in the table (including a total denoted by.a). TYPE: a string describing the type of estimate (e.g. an estimate of a total denoted by EST, a standard error denoted by STDERR, a covariance denoted by COV, or a coefficient of variation denoted by CV). The TYPE, TYPE3 columns further describe the estimate such as if it is an unadjusted or adjusted estimate and if it is a total or a ratio. NVALUE: the calculated value of the estimate listed in the ITEM column. Where do StEPS estimation macros, such as %TAYLOR, get information on what to estimate and what data to use in the estimation? In part (at least for %TAYLOR), StEPS estimation macros read estimation specification files (ESFs) and estimation formula files (EFFs). Estimation processing information is stored in two files: the estimation specification file (ESF) and the estimation formula file (EFF). Both files are organized by table number. The EFF stores SAS expressions and SAS code used by the estimation macros. The ESF contains all other parameters used in the estimation modules. See Sigman (000) for more details on the ESF and EFF. %TAYLOR uses the following member names to identify the input files for each table: ERxxxx an estimation results file for table xxxx containing all necessary point estimates, standard errors of point estimates, and covariances of the function (s). ESF an estimation specifications file containing specifications for BY variables and derived estimates (estimates that are functions of linear estimates) and associated derivatives. If the expression for a derivative contains less than 9 characters, then the ESF also contains the SAS expression for the derivative function. See Figure. EFF an estimation formulas file containing the expression for all other derivative functions. See Figure 3. %TAYLOR is not limited to processing just one table at a time. A macro call to %TAYLOR invokes a %DO loop that processes every table the user specifies via global macro variables TABLE, TABLE,, TABLEn. Introduction to the Code: Set-Up The first part of %TAYLOR splits-up the derived estimates in a given table into two different processing sets: ) a set of all ratio estimates, and ) a set of all other non-linear estimates. The hard-coded formula for calculating the variance is applied to the ratio estimates and Taylor s Linearization Technique is applied to the non-linear estimates. The ESF data set name is stored in the macro variable &PARMS. The variable TABLE is used to select a particular table. The type of information stored in each ESF record is determined by the contents of OBJ_TYPE as follow: BY A list of the by (classification) variables. TOTALS A list of the survey specific analysis variables. DERIVE A list of variables and expressions derived from the list of TOTALS variables. Derived estimates are calculated from totals estimates. Depending on the contents of the OBJ_TYPE variable, the ESF variables VAL, VAL, VAL3, VAL4, CHAR, CHAR, STRING, etc will be populated differently. See Figure and Figure 4 for some examples of OBJ_TYPE = DERIVE records. In the following code, the macro variable RATIO is set to zero if no ratios exist, and is greater-than zero if ratios exist. /* create a dataset for the table that contains all ratio derived estimates. these records all have char=e, char ne blank, and the first word in string = ratio */ proc sql ; create view ratio as select trim(scan(p.string,)) as num, trim(scan(p.string,3)) as den, p.char3, p.val, p.val3, p.val4 from &parms as p

3 Figure. An Example of an Estimation Results File (ERF) Figure. An Example of the OBJ_TYPE = DERIVE Records of an ESF for Ratios Figure 3. En Example of an Estimation Formula File (EFF) Figure 4. An Example of the OBJ_TYPE = DERIVE Records of an ESF that Contains Derivatives

4 where p.table eq upcase("&&table&tabno") and trim(scan(p.string,)) eq "RATIO" and upcase(trim( p.obj_type)) eq "DERIVE" and p.char ne " " and p.char eq "E"; quit; proc sql noprint; select count(*) into :ratio from ratio; quit; %if &ratio gt 0 %then %do; /* Use the hard-coded formula for calculating the variances of ratios */ %end; For ratios the standard errors and CV s are calculated for each BY level using the hard-coded formula. The code for the ratio portion of the program is quite straightforward. Consequently, the rest of the paper will deal exclusively with the code for calculating variances of non-ratio, nonlinear functions of random variables. /* use Taylor s Linearization Technique for non-linear (non-ratio) estimators */ %end; If the macro variable OTHER is 0, then either there are no non-linear functions or the non-linear functions are ratios. We verified that the OTHER portion of the program was correct by calculating the variances of some ratios via the hard-coded formula and via Taylor s method. The results matched exactly. Reading the Derivatives from the ESF and EFF: A derivative can reside either in the ESF or the EFF. If the derivative is less than 9 characters long, it resides in the ESF. See Figure 4, observations thru 5. If the derivative is 30 characters or longer, the derivative resides in the EFF. %TAYLOR knows to look in the EFF by scanning the STRING variable in the ESF for the word CODE. For example, suppose you have the following non-linear function: After processing ratios (if any), next %TAYLOR looks for other non-linear, non-ratio functions. The following code sets the macro variable OTHER to if there are nonlinear, non-ratio functions of random variables. As described earlier, the ESF contains the ESF variables VAL, VAL, VAL3, VAL4, CHAR, CHAR, STRING, etc and will be populated differently from the ratio records. See Figure and Figure 4. f = The derivatives are: f =, 3 4. f =, /* look for non-linear functions excluding ratios */ %let other=0; data _null_; set &parms; if(upcase(table)= upcase("&&table&tabno") and upcase(char)='d' and trim(scan(string,)) ne "RATIO") then %let other=; %if &other gt 0 %then %do; f 3 = 4, f 4 = Figure 4 contains an example of an ESF with the above function and the derivatives. The derivatives are stored in OBJ_TYPE = DERIVE records of the ESF that have CHAR = D. Derivatives are defined in terms of variables specified on OBJ_TYPE = TOTALS records or on OBJ_TYPE = DERIVED records that have CHAR = E. Note that the same function name of the non-linear function (stored in the VAL column; in this case it is just F) is used for multiple derivatives. In general, if a function is made-up of two random variables, then the same function name is used for those two derivatives. If a function is made-up of three random variables, then the same function name is used for those 4.

5 three derivatives, and so on. The flag D in the CHAR field is used to distinguish this type of ESF record as a specification for a derivative. The VAL field contains the variable that the function was differentiated with respect to. The %TAYLOR macro does not assume any particular ordering of the derivatives in the ESF. /* Read in the derivatives from the ESF and EFF files. */ data formulas; set &parms (where = (upcase(table)= upcase("&&table&tabno") and upcase(char)='d')); keep val val char3 string; i+; call symput("form" trim(left(put(i,5.0))), upcase(trim(left(string)))); call symput("va" trim(left(put(i,5.0))), val); call symput("ra" trim(left(put(i,0.0))), trim(left(val)) ); call symput("ra" trim(left(put(i,0.0))), trim(left(val)) trim(left(put(i,5.)))); /* the following variable is created to solve the 8 character limitation of variable names. */ call symput("oth" trim(left(put(i,5.0))), "o" trim(left(put(i,5.)))); /* store the last value of the index i */ call symput("evar", trim(left(put(i,0.)))); If the derivatives are in the EFF (not in the ESF), then the word CODE will appear in the STRING field in the ESF. The following SAS code checks for derivatives in the EFF. The matching keys to the EFF in the code below are: ) The table number: field name TABLE. ) The function name: field name VAL. 3) The variable name of the wrt derivative: field name VAL. The following short macro %GETMORE was written to over come SAS s objection to having %DO loops in open code. /* get derivatives from the EFF file if any */ %macro getmore; %do i = %to &evar; %if %bquote(%trim(%left(&&form&i))) = CODE %then %do; data _null_; set &moreform (where=(val="&&ra&i" and val="&&va&i" and upcase(table)= %upcase("&&table&tabno"))); call symput("form" trim(left(put(&i,5.0))), upcase(scan(code_,, '='))); %end; /* of if-then-do condition */ %end; /* of i do loop */ %mend getmore; %getmore; The %GETMORE macro replaces the word CODE (read from the ESF) with the appropriate derivative. Evaluating the Derivatives at Given Point Estimates: After reading the derivatives, to simplify the subsequent BY statements, we decided to create one giant BY variable which concatenates each BY variable at every BY level. For example, suppose you have two BY variables BY = NAICS, and BY = STATE as shown in Table : Table Two BY Variables with Various Levels BY = NAICS BY = STATE VA MD PA VA MD PA

6 600 VA 600 MD 600 PA Instead of typing the following %DO loop for each data step: data dsn; set dsn; by %do i = %to &count; &&by&i %end; ; Create the MYBY variable in a data step as follow: data table; set dsn; keep %do i = %to &count; &&by&i %end; myby; myby= trim(left(put(by, 0.0))) %do i= %to &count; trim(left(put(by&i, 0.0))); %end; which would yield Table : As you will see, marco %SPLIT, which will be described next, will add another column to the MYBY look-up table because SAS v6. had a limitation of 8 characters for a data set name. Even with the expansion of data set name lengths in SAS v8., we still cannot guarantee that all the lengths of the BY, BY, etc variables will be less than 3 characters long when concatenated together. Next, the macro called SPLIT divides the estimates in the ERF into smaller data sets --- one for each value of the MYBY variable since each MYBY level must have its own vector of evaluated derivatives and its own variancecovariance matrix. Ideally, we would have liked to have named these smaller data sets the same as the MYBY level (e.g. B35000VA; see Destiny, 998). But we could not because MYBY can be longer than 8 characters (SAS v6. limitation). Well, once the estimates have been divided-up, the functions read from a previous data step are evaluated. The SPLIT macro has input parameters: ) the data set name containing the point estimates (the ERF) ) the BY variable names (encoded as MYBY) The data set resolved by &DSN was created when reading the point estimates. Table MYBY Look-up Table BY = NAICS BY = STATE MYBY VA 35000VA MD 35000MD PA 35000PA VA 36000VA MD 36000MD PA 36000PA 600 VA 600VA 600 MD 600MD 600 PA 600PA Some Notes: The PUT statement will work on both numeric variables and character variables. The MYBY table will be needed at the very end of the %TAYLOR macro to put the results into the ERF (which has a standard format that includes BY, BY, etc). We must map the concatenated BY values back to the appropriate detached BY variables. /* Divide the original data set into smaller data sets --- one for each "by" value. The macro has been modified from Macros in SAS Software to accommodate evaluating the derivatives at point estimates. Reference page 5 of Macros in SAS software */ %macro split(inputds, byvar); %global numobs; data _null_; set by_table end=eof; if eof then call symput('numobs', put(_n_, 5.)); data %do i = %to &numobs; b&i %end; ; /* of i do loop */ set &inputds; %let else=;

7 %do i = %to &numobs; &else if &byvar = "&&b&i" then output b&i; %let else = else; %end; /* of i do loop */ /* Create macro variables for the estimates. Use two indicies on the macro variable name: ) identify the data set, ) identify the variable */ %do j = %to &numobs; data b&j; set b&j end=eof; if _n_ = then i=0; i+; call symput("var" trim(left(put(&j,5.0))) trim(left(put(i,5.0)), item ); call symput("val" trim(left(put(&j,5.0))) trim(left(put(i,5.0))), trim(put(nvalue, 4.5))); if eof then call symput("nvar" trim(left(put(&j,5.0))), put(_n_,5.0)); %end; /* of j do loop */ The point estimates and the derivatives for each MYBY level are put into the same data step using macro variables. The point estimates are assigned first for each variable in the derivative (e.g = ). Next, the derivatives are evaluated (e.g. VA = /00000). The derivatives were arbitrarily named VA, VA, VA3, etc (the order that the differentiation variables were read-in is most important). This is done for each B, B, B3, data set created in the previous code. Finally, the data sets are transposed for easy PROC IML manipulation. /* put the point estimates in a data step with the derivatives for execution. */ %do j = %to &numobs; data b&j; set b&j; /* initialize the variables */ %do k = %to &&nvar&j; %end; &&var&j&k = &&val&j&k; /* eqtn is a macro to place the derivatives in the data step */ %eqtn; proc transpose data=b&j out=b&j; %end; /* of j do loop */ %mend split; %split(&dsn, myby); Each data set created by macro SPLIT (e.g. B, B, B3, ) contains the evaluated derivatives for each function and the variable that was differentiated withrespect-to. These data sets will be read into PROC IML to form the d vectors for calculating variances. Whatever survey specific variable the derivative of the function was taken with-respect-to must appear in the covariance matrix in the same order. The d vector (in PROC IML) contains the derivatives evaluated with point estimates for each MYBY level for all of the functions. Thus, the d vector gets separated into smaller vectors for each function in PROC IML. Reading the variances and covariances from a StEPS ERF is very similar to reading the point estimates. What Happens to the MYBY Table? The MYBY table gets updated as follow: Table 3 Updated MYBY Look-up Table BY = NAICS BY = STATE MYBY Data Set Name VA 35000VA B MD 35000MD B PA 35000PA B VA 36000VA B MD 36000MD B PA 36000PA B6 600 VA 600VA B7 600 MD 600MD B8 600 PA 600PA B9 Filling the Data Set Name column is very easy. So far in %TAYLOR, no sorting has been done on the ERF or the ESF. Those two files are in the exact same order as the last user left them. The only sorting in %TAYLOR occurs at the very end to update the ERF with the standard errors and CVs via a matched merge.

8 PROC IML: Building the Covariance Matrix Building the covariance matrix involves ensuring that the ordering in the d vector corresponds with the ordering in the covariance matrix S of survey specific variables. The data sets created in the SPLIT macro are brought into PROC IML with the USE command. A macro DO loop executes PROC IML on each of the data sets created in the SPLIT macro. There are &numobs data sets to be processed, one for each level of the BY variable. The data set names are accessed by the macro variable B&i. Each ERF contains the same column names. Generally speaking, however, the column names are not in the same order from one ERF to the next. Thus, the column names must be read into PROC IML. The indices of the ITEM, ITEM, and TYPE column names must be stored in scalar variables. Those scalar variables will be used as indicies when building the covariance matrix. The CC vector contains the indices to the point estimates to be used in calculating the standard error. By creating the CC vector, the user can key in the derivative expressions in any order under StEPS. The vector R contains the names of the variables differentiated withrespect-to (e.g. the VAL StEPS variable). The vector Z contains the standard errors or covariances (actual numbers in this one) of the variables that were differentiated with-respect-to. /* Fill the main diagonal of the covariance matrix */ m = nrow(w); /* the matrix W contains non-numeric ERF data */ S = J(&evar, &evar, 0); /* this will be the covariance matrix with actual numbers in it. */ B = J(&evar, &evar," "); /* The matrix B is used to store variable names used on the diagonal of S. From that, the off-diagonal elements of B are thus set. Without these names, it s impossible to fill-in the rest of the S matrix. */ do p = to m; /* loop thru the ERF column names for matches */ do q = to k-; /* k is k-plus- number of diagonal elements */ if ((trim(w[p, item]) = trim(r[cc[q]])) & (trim(w[p, type]) = "STDERR")& (trim(ratio) = trim(y[cc[q]]))) then do; S[q, q] = z[p]**; /* put the variance on the diagonal */ B[q,q] = r[cc[q]]; /* store diagonal variable names */ do a = to q-; B[q,a] = r[cc[a]] ; /* store offdiagonal variable names */ end; /* of a loop */ end; /* of if-then-do statement */ end; /* of q loop */ end; /* of p loop */ The technique is very simple to do by hand. Let s say you took derivatives of the function f wrt the variables,, 3, and 4, and you wish to form a covariance matrix from a column of standard errors and covariances. The dimensions of the covariance matrix is 4 4. So when the DO p and DO q nested loops execute, we have: B = 3 4 When the DO a loop executes, we have: B = Thus, the covariance matrix S should look something like: S = VAR() COV(, ) VAR() COV(, 3) COV(, 3) VAR(3) COV(, 4) COV(, 4) COV(3, 4) VAR(4) The matrix S contains the variances and covariances of the variables involved in the derivatives. Note that the IF-THEN conditions use the scalar variables that identify the columns for the ITEM (e.g. variable names of estimates), ITEM (e.g. more variable names of estimates when appropriate), and TYPE(type of estimate. e.g. standard error, covariance, etc.) non-numeric data in the ERF. The conditions looks slightly complicated because for example COV(, ) = COV(, ), and no particular ordering is assumed. do k = to q-; /* loop thru the rows

9 of B */ do p = to k; /* loop thru the columns of B */ do a = to m; /* loop thru the rows of W. W contains nonnumeric ERF data */ if ((trim(b[k, p]) = trim(w[a,item])) & (trim(b[k, k]) = trim(w[a, item]))) ((trim(b[k, k]) = trim(w[a, item])) & (trim(b[k, p]) = trim(w[a, item])))& (trim(w[a, type]) = "COV") then do; /* fill-in the lower portion covariances */ S[p, k] = z[a]; /* fill-in the upper portion covariances */ S[k, p] = z[a]; end; /* of the if-then-do statement */ end; /* of a do loop */ end; /* of p do loop */ end; /* of k do loop */ The Final Calculation: With the d vector and the S matrix in hand, the final calculation in PROC IML becomes: value = sqrt(d`*s*d); PROC IML recognizes the asterisk (*) as matrix multiplication and the back-single-quote (`) as transposition. VALUE is stored in a vector with the other standard errors and outputted to a SAS data set after all the standard errors have been calculated. Forming the covariance matrix S is done for every level of the MYBY variable for every non-linear function (excluding ratios). After calculating the standard errors in PROC IML, the results are merged back into an ERF. Also, %TAYLOR looks for the associated derived estimates in the ERF of each non-linear function. If present, %TAYLOR calculates the coefficients of variation (CV s) and merges them into the ERF. Concluding Remarks: The Manufacturing Energy Consumption Survey (MECS) and the Plant Capacity Utilitization (PCU) Survey were the first two surveys to use %TAYLOR. MECS used %TAYLOR to calculate standard errors and CV s of ratios. The ratio specifications were entered into StEPS by a survey statistician with very little mathematical/statistical experience. As outlined in the Introduction, no derivatives were required. PCU used %TAYLOR to calculate standard errors and CV s of year-to-year change in utilization rates. Their input functions were differences of ratios reflecting year to year change. Each of those functions required 4 derivatives and 6 covariances prior to processing (in addition to point estimates and variances of the survey variables involved). There were a total of 48 levels of MYBY and consequently 48 small B, B, etc. data sets created in %SPLIT. References: Ahmed, Shirin A. and Tasky, D. L. (000), An Overview of the Standard Economic Processing System (StEPS), The Second International Conference on Establishment Surveys,, Destiny Corporation (998), Macros in SAS Software, 3 Silas Deane Highway #A, Wethersfield, CT, Sarndal, Carl-Erik, Swensson, Bengt, Wretman, Jan (99), Model Assisted Survey Sampling, New York: Springer-Verlag. Sigman, Richard (000), Estimation and Variance Estimation in a Standard Economic Processing System, The Second International Conference on Establishment Surveys,, Tasky, Deborah, Linonis, A., Ankers, S., Hallam, D., Atmayer, L., and Chew, D. (999), Get in Step with StEPS: Standard Economic Processing System, Proceedings of the North East SAS Users Group, pp Wolter, Kirk M. (985), Introduction to Variance Estimation, New York: Springer-Verlag.

10 Contact Information: Roger L. Goodwin U.S. Bureau of the Census 4700 Silver Hill Road BLDG Suitland, MD Katherine J. Thompson U.S. Bureau of the Census 4700 Silver Hill Road BLDG Suitland, MD

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys Richard L. Downs, Jr. and Pura A. Peréz U.S. Bureau of the Census, Washington, D.C. ABSTRACT This paper explains

More information

SAS Macro Dynamics: from Simple Basics to Powerful Invocations Rick Andrews, Office of Research, Development, and Information, Baltimore, MD

SAS Macro Dynamics: from Simple Basics to Powerful Invocations Rick Andrews, Office of Research, Development, and Information, Baltimore, MD ABSTRACT CODERS CORNER SAS Macro Dynamics: from Simple Basics to Powerful Invocations Rick Andrews, Office of Research, Development, and Information, Baltimore, MD The SAS Macro Facility offers a mechanism

More information

Handling Numeric Representation SAS Errors Caused by Simple Floating-Point Arithmetic Computation Fuad J. Foty, U.S. Census Bureau, Washington, DC

Handling Numeric Representation SAS Errors Caused by Simple Floating-Point Arithmetic Computation Fuad J. Foty, U.S. Census Bureau, Washington, DC Paper BB-206 Handling Numeric Representation SAS Errors Caused by Simple Floating-Point Arithmetic Computation Fuad J. Foty, U.S. Census Bureau, Washington, DC ABSTRACT Every SAS programmer knows that

More information

CREATING A SUMMARY TABLE OF NORMALIZED (Z) SCORES

CREATING A SUMMARY TABLE OF NORMALIZED (Z) SCORES CREATING A SUMMARY TABLE OF NORMALIZED (Z) SCORES Walter W. OWen The Biostatistics Center The George Washington University ABSTRACT Data from the behavioral sciences are often analyzed by normalizing the

More information

Penetrating the Matrix Justin Z. Smith, William Gui Zupko II, U.S. Census Bureau, Suitland, MD

Penetrating the Matrix Justin Z. Smith, William Gui Zupko II, U.S. Census Bureau, Suitland, MD Penetrating the Matrix Justin Z. Smith, William Gui Zupko II, U.S. Census Bureau, Suitland, MD ABSTRACT While working on a time series modeling problem, we needed to find the row and column that corresponded

More information

Paper SDA-11. Logistic regression will be used for estimation of net error for the 2010 Census as outlined in Griffin (2005).

Paper SDA-11. Logistic regression will be used for estimation of net error for the 2010 Census as outlined in Griffin (2005). Paper SDA-11 Developing a Model for Person Estimation in Puerto Rico for the 2010 Census Coverage Measurement Program Colt S. Viehdorfer, U.S. Census Bureau, Washington, DC This report is released to inform

More information

Taming a Spreadsheet Importation Monster

Taming a Spreadsheet Importation Monster SESUG 2013 Paper BtB-10 Taming a Spreadsheet Importation Monster Nat Wooding, J. Sargeant Reynolds Community College ABSTRACT As many programmers have learned to their chagrin, it can be easy to read Excel

More information

HILDA PROJECT TECHNICAL PAPER SERIES No. 2/08, February 2008

HILDA PROJECT TECHNICAL PAPER SERIES No. 2/08, February 2008 HILDA PROJECT TECHNICAL PAPER SERIES No. 2/08, February 2008 HILDA Standard Errors: A Users Guide Clinton Hayes The HILDA Project was initiated, and is funded, by the Australian Government Department of

More information

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by Paper CC-016 A macro for nearest neighbor Lung-Chang Chien, University of North Carolina at Chapel Hill, Chapel Hill, NC Mark Weaver, Family Health International, Research Triangle Park, NC ABSTRACT SAS

More information

Using PROC SQL to Generate Shift Tables More Efficiently

Using PROC SQL to Generate Shift Tables More Efficiently ABSTRACT SESUG Paper 218-2018 Using PROC SQL to Generate Shift Tables More Efficiently Jenna Cody, IQVIA Shift tables display the change in the frequency of subjects across specified categories from baseline

More information

A Cross-national Comparison Using Stacked Data

A Cross-national Comparison Using Stacked Data A Cross-national Comparison Using Stacked Data Goal In this exercise, we combine household- and person-level files across countries to run a regression estimating the usual hours of the working-aged civilian

More information

There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA

There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA Paper HW04 There s No Such Thing as Normal Clinical Trials Data, or Is There? Daphne Ewing, Octagon Research Solutions, Inc., Wayne, PA ABSTRACT Clinical Trials data comes in all shapes and sizes depending

More information

An Application of PROC NLP to Survey Sample Weighting

An Application of PROC NLP to Survey Sample Weighting An Application of PROC NLP to Survey Sample Weighting Talbot Michael Katz, Analytic Data Information Technologies, New York, NY ABSTRACT The classic weighting formula for survey respondents compensates

More information

BACKGROUND INFORMATION ON COMPLEX SAMPLE SURVEYS

BACKGROUND INFORMATION ON COMPLEX SAMPLE SURVEYS Analysis of Complex Sample Survey Data Using the SURVEY PROCEDURES and Macro Coding Patricia A. Berglund, Institute For Social Research-University of Michigan, Ann Arbor, Michigan ABSTRACT The paper presents

More information

Hot-deck Imputation with SAS Arrays and Macros for Large Surveys

Hot-deck Imputation with SAS Arrays and Macros for Large Surveys Hot-deck Imation with SAS Arrays and Macros for Large Surveys John Stiller and Donald R. Dalzell Continuous Measurement Office, Demographic Statistical Methods Division, U.S. Census Bureau ABSTRACT SAS

More information

A SAS Macro for Balancing a Weighted Sample

A SAS Macro for Balancing a Weighted Sample Paper 258-25 A SAS Macro for Balancing a Weighted Sample David Izrael, David C. Hoaglin, and Michael P. Battaglia Abt Associates Inc., Cambridge, Massachusetts Abstract It is often desirable to adjust

More information

Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA

Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA Choosing the Right Technique to Merge Large Data Sets Efficiently Qingfeng Liang, Community Care Behavioral Health Organization, Pittsburgh, PA ABSTRACT This paper outlines different SAS merging techniques

More information

ABSTRACT INTRODUCTION MACRO. Paper RF

ABSTRACT INTRODUCTION MACRO. Paper RF Paper RF-08-2014 Burst Reporting With the Help of PROC SQL Dan Sturgeon, Priority Health, Grand Rapids, Michigan Erica Goodrich, Priority Health, Grand Rapids, Michigan ABSTRACT Many SAS programmers need

More information

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion

More information

SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD

SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD Paper BB-7 SAS Macro Dynamics - From Simple Basics to Powerful Invocations Rick Andrews, Office of the Actuary, CMS, Baltimore, MD ABSTRACT The SAS Macro Facility offers a mechanism for expanding and customizing

More information

Statistics, Data Analysis & Econometrics

Statistics, Data Analysis & Econometrics ST009 PROC MI as the Basis for a Macro for the Study of Patterns of Missing Data Carl E. Pierchala, National Highway Traffic Safety Administration, Washington ABSTRACT The study of missing data patterns

More information

Sampling Financial Records Using SurveySelect

Sampling Financial Records Using SurveySelect Paper 3240-2015 Sampling Financial Records Using SurveySelect Roger L. Goodwin, US Government Printing Office ABSTRACT This paper presents an application of the procedure SurveySelect. The objective is

More information

A Side of Hash for You To Dig Into

A Side of Hash for You To Dig Into A Side of Hash for You To Dig Into Shan Ali Rasul, Indigo Books & Music Inc, Toronto, Ontario, Canada. ABSTRACT Within the realm of Customer Relationship Management (CRM) there is always a need for segmenting

More information

Keeping Track of Database Changes During Database Lock

Keeping Track of Database Changes During Database Lock Paper CC10 Keeping Track of Database Changes During Database Lock Sanjiv Ramalingam, Biogen Inc., Cambridge, USA ABSTRACT Higher frequency of data transfers combined with greater likelihood of changes

More information

Get Started Writing SAS Macros Luisa Hartman, Jane Liao, Merck Sharp & Dohme Corp.

Get Started Writing SAS Macros Luisa Hartman, Jane Liao, Merck Sharp & Dohme Corp. Get Started Writing SAS Macros Luisa Hartman, Jane Liao, Merck Sharp & Dohme Corp. ABSTRACT The SAS Macro Facility is a tool which lends flexibility to your SAS code and promotes easier maintenance. It

More information

KEYWORDS Metadata, macro language, CALL EXECUTE, %NRSTR, %TSLIT

KEYWORDS Metadata, macro language, CALL EXECUTE, %NRSTR, %TSLIT MWSUG 2017 - Paper BB15 Building Intelligent Macros: Driving a Variable Parameter System with Metadata Arthur L. Carpenter, California Occidental Consultants, Anchorage, Alaska ABSTRACT When faced with

More information

Tales from the Help Desk 6: Solutions to Common SAS Tasks

Tales from the Help Desk 6: Solutions to Common SAS Tasks SESUG 2015 ABSTRACT Paper BB-72 Tales from the Help Desk 6: Solutions to Common SAS Tasks Bruce Gilsen, Federal Reserve Board, Washington, DC In 30 years as a SAS consultant at the Federal Reserve Board,

More information

Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy

Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, Roma, Italy Graph Theory for Modelling a Survey Questionnaire Pierpaolo Massoli, ISTAT via Adolfo Ravà 150, 00142 Roma, Italy e-mail: pimassol@istat.it 1. Introduction Questions can be usually asked following specific

More information

PharmaSUG Paper TT11

PharmaSUG Paper TT11 PharmaSUG 2014 - Paper TT11 What is the Definition of Global On-Demand Reporting within the Pharmaceutical Industry? Eric Kammer, Novartis Pharmaceuticals Corporation, East Hanover, NJ ABSTRACT It is not

More information

How to Keep Multiple Formats in One Variable after Transpose Mindy Wang

How to Keep Multiple Formats in One Variable after Transpose Mindy Wang How to Keep Multiple Formats in One Variable after Transpose Mindy Wang Abstract In clinical trials and many other research fields, proc transpose are used very often. When many variables with their individual

More information

To conceptualize the process, the table below shows the highly correlated covariates in descending order of their R statistic.

To conceptualize the process, the table below shows the highly correlated covariates in descending order of their R statistic. Automating the process of choosing among highly correlated covariates for multivariable logistic regression Michael C. Doherty, i3drugsafety, Waltham, MA ABSTRACT In observational studies, there can be

More information

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System %MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System Rushi Patel, Creative Information Technology, Inc., Arlington, VA ABSTRACT It is common to find

More information

The %let is a Macro command, which sets a macro variable to the value specified.

The %let is a Macro command, which sets a macro variable to the value specified. Paper 220-26 Structuring Base SAS for Easy Maintenance Gary E. Schlegelmilch, U.S. Dept. of Commerce, Bureau of the Census, Suitland MD ABSTRACT Computer programs, by their very nature, are built to be

More information

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA

PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA PharmaSUG 2013 CC26 Automating the Labeling of X- Axis Sanjiv Ramalingam, Vertex Pharmaceuticals, Inc., Cambridge, MA ABSTRACT Labeling of the X-axis usually involves a tedious axis statement specifying

More information

Identifying Duplicate Variables in a SAS Data Set

Identifying Duplicate Variables in a SAS Data Set Paper 1654-2018 Identifying Duplicate Variables in a SAS Data Set Bruce Gilsen, Federal Reserve Board, Washington, DC ABSTRACT In the big data era, removing duplicate data from a data set can reduce disk

More information

STATION

STATION ------------------------------STATION 1------------------------------ 1. Which of the following statements displays all user-defined macro variables in the SAS log? a) %put user=; b) %put user; c) %put

More information

Basic SQL Processing Prepared by Destiny Corporation

Basic SQL Processing Prepared by Destiny Corporation Basic SQL Processing Prepared by Destiny Corporation SQLStatements PROC SQl consists often statements: from saved.computeg- l.select 2.vAlIDATE 3.DESCRIBE 4.CREATE S.DROP a.update 7.INSERT B.DElETE 9.ALTER

More information

Your Own SAS Macros Are as Powerful as You Are Ingenious

Your Own SAS Macros Are as Powerful as You Are Ingenious Paper CC166 Your Own SAS Macros Are as Powerful as You Are Ingenious Yinghua Shi, Department Of Treasury, Washington, DC ABSTRACT This article proposes, for user-written SAS macros, separate definitions

More information

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG Paper SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG Qixuan Chen, University of Michigan, Ann Arbor, MI Brenda Gillespie, University of Michigan, Ann Arbor, MI ABSTRACT This paper

More information

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines

So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines Paper TT13 So Much Data, So Little Time: Splitting Datasets For More Efficient Run Times and Meeting FDA Submission Guidelines Anthony Harris, PPD, Wilmington, NC Robby Diseker, PPD, Wilmington, NC ABSTRACT

More information

Arthur L. Carpenter California Occidental Consultants, Oceanside, California

Arthur L. Carpenter California Occidental Consultants, Oceanside, California Paper 028-30 Storing and Using a List of Values in a Macro Variable Arthur L. Carpenter California Occidental Consultants, Oceanside, California ABSTRACT When using the macro language it is not at all

More information

SAS/STAT 13.1 User s Guide. The NESTED Procedure

SAS/STAT 13.1 User s Guide. The NESTED Procedure SAS/STAT 13.1 User s Guide The NESTED Procedure This document is an individual chapter from SAS/STAT 13.1 User s Guide. The correct bibliographic citation for the complete manual is as follows: SAS Institute

More information

Statistics, Data Analysis & Econometrics

Statistics, Data Analysis & Econometrics ST001 A SAS MACRO FOR THE CONTROLLED ROUNDING OF ONE- AND TWO-DIMENSIONAL TABLES OF REAL NUMBERS Robert D. Sands Bureau of the Census Keywords: Controlled Rounding, Tabular data, Transportation algorithm

More information

Statistical matching: conditional. independence assumption and auxiliary information

Statistical matching: conditional. independence assumption and auxiliary information Statistical matching: conditional Training Course Record Linkage and Statistical Matching Mauro Scanu Istat scanu [at] istat.it independence assumption and auxiliary information Outline The conditional

More information

PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA

PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA ABSTRACT SAS does not have an option for PROC REG (or any of its other equation estimation procedures)

More information

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine

Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine PharmaSUG 2015 - Paper QT21 Sorting big datasets. Do we really need it? Daniil Shliakhov, Experis Clinical, Kharkiv, Ukraine ABSTRACT Very often working with big data causes difficulties for SAS programmers.

More information

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95 A Statistical Analysis Macro Library in SAS Carl R. Haske, Ph.D., STATPROBE, nc., Ann Arbor, M Vivienne Ward, M.S., STATPROBE, nc., Ann Arbor, M ABSTRACT Statistical analysis plays a major role in pharmaceutical

More information

Contents of SAS Programming Techniques

Contents of SAS Programming Techniques Contents of SAS Programming Techniques Chapter 1 About SAS 1.1 Introduction 1.1.1 SAS modules 1.1.2 SAS module classification 1.1.3 SAS features 1.1.4 Three levels of SAS techniques 1.1.5 Chapter goal

More information

UNIT-IV: MACRO PROCESSOR

UNIT-IV: MACRO PROCESSOR UNIT-IV: MACRO PROCESSOR A Macro represents a commonly used group of statements in the source programming language. A macro instruction (macro) is a notational convenience for the programmer o It allows

More information

Using SAS/SCL to Create Flexible Programs... A Super-Sized Macro Ellen Michaliszyn, College of American Pathologists, Northfield, IL

Using SAS/SCL to Create Flexible Programs... A Super-Sized Macro Ellen Michaliszyn, College of American Pathologists, Northfield, IL Using SAS/SCL to Create Flexible Programs... A Super-Sized Macro Ellen Michaliszyn, College of American Pathologists, Northfield, IL ABSTRACT SAS is a powerful programming language. When you find yourself

More information

Simulating Multivariate Normal Data

Simulating Multivariate Normal Data Simulating Multivariate Normal Data You have a population correlation matrix and wish to simulate a set of data randomly sampled from a population with that structure. I shall present here code and examples

More information

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX

From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX Paper 152-27 From Manual to Automatic with Overdrive - Using SAS to Automate Report Generation Faron Kincheloe, Baylor University, Waco, TX ABSTRACT This paper is a case study of how SAS products were

More information

Usage of R in Offi cial Statistics Survey Data Analysis at the Statistical Offi ce of the Republic of Slovenia

Usage of R in Offi cial Statistics Survey Data Analysis at the Statistical Offi ce of the Republic of Slovenia Usage of R in Offi cial Statistics Survey Data Analysis at the Statistical Offi ce of the Republic of Slovenia Jerneja PIKELJ (jerneja.pikelj@gov.si) Statistical Offi ce of the Republic of Slovenia ABSTRACT

More information

Are you Still Afraid of Using Arrays? Let s Explore their Advantages

Are you Still Afraid of Using Arrays? Let s Explore their Advantages Paper CT07 Are you Still Afraid of Using Arrays? Let s Explore their Advantages Vladyslav Khudov, Experis Clinical, Kharkiv, Ukraine ABSTRACT At first glance, arrays in SAS seem to be a complicated and

More information

Creating Macro Calls using Proc Freq

Creating Macro Calls using Proc Freq Creating Macro Calls using Proc Freq, Educational Testing Service, Princeton, NJ ABSTRACT Imagine you were asked to get a series of statistics/tables for each country in the world. You have the data, but

More information

The NESTED Procedure (Chapter)

The NESTED Procedure (Chapter) SAS/STAT 9.3 User s Guide The NESTED Procedure (Chapter) SAS Documentation This document is an individual chapter from SAS/STAT 9.3 User s Guide. The correct bibliographic citation for the complete manual

More information

Let s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic

Let s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic PharmaSUG 2018 - Paper EP-09 Let s Get FREQy with our Statistics: Data-Driven Approach to Determining Appropriate Test Statistic Richann Watson, DataRich Consulting, Batavia, OH Lynn Mullins, PPD, Cincinnati,

More information

PharmaSUG Paper SP04

PharmaSUG Paper SP04 PharmaSUG 2015 - Paper SP04 Means Comparisons and No Hard Coding of Your Coefficient Vector It Really Is Possible! Frank Tedesco, United Biosource Corporation, Blue Bell, Pennsylvania ABSTRACT When doing

More information

In this paper, we will build the macro step-by-step, highlighting each function. A basic familiarity with SAS Macro language is assumed.

In this paper, we will build the macro step-by-step, highlighting each function. A basic familiarity with SAS Macro language is assumed. No More Split Ends: Outputting Multiple CSV Files and Keeping Related Records Together Gayle Springer, JHU Bloomberg School of Public Health, Baltimore, MD ABSTRACT The EXPORT Procedure allows us to output

More information

Matching Rules: Too Loose, Too Tight, or Just Right?

Matching Rules: Too Loose, Too Tight, or Just Right? Paper 1674-2014 Matching Rules: Too Loose, Too Tight, or Just Right? Richard Cadieux, Towers Watson, Arlington, VA & Daniel R. Bretheim, Towers Watson, Arlington, VA ABSTRACT This paper describes a technique

More information

Paper Appendix 4 contains an example of a summary table printed from the dataset, sumary.

Paper Appendix 4 contains an example of a summary table printed from the dataset, sumary. Paper 93-28 A Macro Using SAS ODS to Summarize Client Information from Multiple Procedures Stuart Long, Westat, Durham, NC Rebecca Darden, Westat, Durham, NC Abstract If the client requests the programmer

More information

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS

INTRODUCTION TO SAS HOW SAS WORKS READING RAW DATA INTO SAS TO SAS NEED FOR SAS WHO USES SAS WHAT IS SAS? OVERVIEW OF BASE SAS SOFTWARE DATA MANAGEMENT FACILITY STRUCTURE OF SAS DATASET SAS PROGRAM PROGRAMMING LANGUAGE ELEMENTS OF THE SAS LANGUAGE RULES FOR SAS

More information

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico

Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico PharmaSUG 2011 - Paper TT02 Create a Format from a SAS Data Set Ruth Marisol Rivera, i3 Statprobe, Mexico City, Mexico ABSTRACT Many times we have to apply formats and it could be hard to create them specially

More information

Program Validation: Logging the Log

Program Validation: Logging the Log Program Validation: Logging the Log Adel Fahmy, Symbiance Inc., Princeton, NJ ABSTRACT Program Validation includes checking both program Log and Logic. The program Log should be clear of any system Error/Warning

More information

%Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables

%Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables %Addval: A SAS Macro Which Completes the Cartesian Product of Dataset Observations for All Values of a Selected Set of Variables Rich Schiefelbein, PRA International, Lenexa, KS ABSTRACT It is often useful

More information

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY

A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY PharmaSUG 2014 - Paper BB14 A SAS Macro Utility to Modify and Validate RTF Outputs for Regional Analyses Jagan Mohan Achi, PPD, Austin, TX Joshua N. Winters, PPD, Rochester, NY ABSTRACT Clinical Study

More information

Checking for Duplicates Wendi L. Wright

Checking for Duplicates Wendi L. Wright Checking for Duplicates Wendi L. Wright ABSTRACT This introductory level paper demonstrates a quick way to find duplicates in a dataset (with both simple and complex keys). It discusses what to do when

More information

Top Coding Tips. Neil Merchant Technical Specialist - SAS

Top Coding Tips. Neil Merchant Technical Specialist - SAS Top Coding Tips Neil Merchant Technical Specialist - SAS Bio Work in the ANSWERS team at SAS o Analytics as a Service and Visual Analytics Try before you buy SAS user for 12 years obase SAS and O/S integration

More information

Using SAS software to shrink the data in your applications

Using SAS software to shrink the data in your applications Paper 991-2016 Using SAS software to shrink the data in your applications Ahmed Al-Attar, AnA Data Warehousing Consulting LLC, McLean, VA ABSTRACT This paper discusses the techniques I used at the Census

More information

The Proc Transpose Cookbook

The Proc Transpose Cookbook ABSTRACT PharmaSUG 2017 - Paper TT13 The Proc Transpose Cookbook Douglas Zirbel, Wells Fargo and Co. Proc TRANSPOSE rearranges columns and rows of SAS datasets, but its documentation and behavior can be

More information

Paper An Automated Reporting Macro to Create Cell Index An Enhanced Revisit. Shi-Tao Yeh, GlaxoSmithKline, King of Prussia, PA

Paper An Automated Reporting Macro to Create Cell Index An Enhanced Revisit. Shi-Tao Yeh, GlaxoSmithKline, King of Prussia, PA ABSTRACT Paper 236-28 An Automated Reporting Macro to Create Cell Index An Enhanced Revisit When generating tables from SAS PROC TABULATE or PROC REPORT to summarize data, sometimes it is necessary to

More information

Automatic Indicators for Dummies: A macro for generating dummy indicators from category type variables

Automatic Indicators for Dummies: A macro for generating dummy indicators from category type variables MWSUG 2018 - Paper AA-29 Automatic Indicators for Dummies: A macro for generating dummy indicators from category type variables Matthew Bates, Affusion Consulting, Columbus, OH ABSTRACT Dummy Indicators

More information

Appendix B BASIC MATRIX OPERATIONS IN PROC IML B.1 ASSIGNING SCALARS

Appendix B BASIC MATRIX OPERATIONS IN PROC IML B.1 ASSIGNING SCALARS Appendix B BASIC MATRIX OPERATIONS IN PROC IML B.1 ASSIGNING SCALARS Scalars can be viewed as 1 1 matrices and can be created using Proc IML by using the statement x¼scalar_value or x¼{scalar_value}. As

More information

SAS seminar. The little SAS book Chapters 3 & 4. April 15, Åsa Klint. By LD Delwiche and SJ Slaughter. 3.1 Creating and Redefining variables

SAS seminar. The little SAS book Chapters 3 & 4. April 15, Åsa Klint. By LD Delwiche and SJ Slaughter. 3.1 Creating and Redefining variables SAS seminar April 15, 2003 Åsa Klint The little SAS book Chapters 3 & 4 By LD Delwiche and SJ Slaughter Data step - read and modify data - create a new dataset - performs actions on rows Proc step - use

More information

Know What You Are Missing: How to Catalogue and Manage Missing Pieces of Historical Data

Know What You Are Missing: How to Catalogue and Manage Missing Pieces of Historical Data Know What You Are Missing: How to Catalogue and Manage Missing Pieces of Historical Data Shankar Yaddanapudi, SAS Consultant, Washington DC ABSTRACT In certain applications it is necessary to maintain

More information

Out of Control! A SAS Macro to Recalculate QC Statistics

Out of Control! A SAS Macro to Recalculate QC Statistics Paper 3296-2015 Out of Control! A SAS Macro to Recalculate QC Statistics Jesse Pratt, Colleen Mangeot, Kelly Olano, Cincinnati Children s Hospital Medical Center, Cincinnati, OH, USA ABSTRACT SAS/QC provides

More information

Smoking and Missingness: Computer Syntax 1

Smoking and Missingness: Computer Syntax 1 Smoking and Missingness: Computer Syntax 1 Computer Syntax SAS code is provided for the logistic regression imputation described in this article. This code is listed in parts, with description provided

More information

Geocoding Crashes in Limbo Carol Martell and Daniel Levitt Highway Safety Research Center, Chapel Hill, NC

Geocoding Crashes in Limbo Carol Martell and Daniel Levitt Highway Safety Research Center, Chapel Hill, NC Paper RIV-09 Geocoding Crashes in Limbo Carol Martell and Daniel Levitt Highway Safety Research Center, Chapel Hill, NC ABSTRACT In North Carolina, crash locations are documented only with the road names

More information

Lab 9. Julia Janicki. Introduction

Lab 9. Julia Janicki. Introduction Lab 9 Julia Janicki Introduction My goal for this project is to map a general land cover in the area of Alexandria in Egypt using supervised classification, specifically the Maximum Likelihood and Support

More information

A Quick and Gentle Introduction to PROC SQL

A Quick and Gentle Introduction to PROC SQL ABSTRACT Paper B2B 9 A Quick and Gentle Introduction to PROC SQL Shane Rosanbalm, Rho, Inc. Sam Gillett, Rho, Inc. If you are afraid of SQL, it is most likely because you haven t been properly introduced.

More information

Mapping Clinical Data to a Standard Structure: A Table Driven Approach

Mapping Clinical Data to a Standard Structure: A Table Driven Approach ABSTRACT Paper AD15 Mapping Clinical Data to a Standard Structure: A Table Driven Approach Nancy Brucken, i3 Statprobe, Ann Arbor, MI Paul Slagle, i3 Statprobe, Ann Arbor, MI Clinical Research Organizations

More information

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma ABSTRACT Today there is more pressure on programmers to deliver summary outputs faster without sacrificing quality. By using just a few programming

More information

Symbol Table Generator (New and Improved) Jim Johnson, JKL Consulting, North Wales, PA

Symbol Table Generator (New and Improved) Jim Johnson, JKL Consulting, North Wales, PA PharmaSUG2011 - Paper AD19 Symbol Table Generator (New and Improved) Jim Johnson, JKL Consulting, North Wales, PA ABSTRACT In Seattle at the PharmaSUG 2000 meeting the Symbol Table Generator was first

More information

Validation Summary using SYSINFO

Validation Summary using SYSINFO Validation Summary using SYSINFO Srinivas Vanam Mahipal Vanam Shravani Vanam Percept Pharma Services, Bridgewater, NJ ABSTRACT This paper presents a macro that produces a Validation Summary using SYSINFO

More information

David S. Septoff Fidia Pharmaceutical Corporation

David S. Septoff Fidia Pharmaceutical Corporation UNLIMITING A LIMITED MACRO ENVIRONMENT David S. Septoff Fidia Pharmaceutical Corporation ABSTRACT The full Macro facility provides SAS users with an extremely powerful programming tool. It allows for conditional

More information

Foundations and Fundamentals. SAS System Options: The True Heroes of Macro Debugging Kevin Russell and Russ Tyndall, SAS Institute Inc.

Foundations and Fundamentals. SAS System Options: The True Heroes of Macro Debugging Kevin Russell and Russ Tyndall, SAS Institute Inc. SAS System Options: The True Heroes of Macro Debugging Kevin Russell and Russ Tyndall, SAS Institute Inc., Cary, NC ABSTRACT It is not uncommon for the first draft of any macro application to contain errors.

More information

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency. Paper 76-28 Comparative Efficiency of SQL and Base Code When Reading from Database Tables and Existing Data Sets Steven Feder, Federal Reserve Board, Washington, D.C. ABSTRACT In this paper we compare

More information

Coders' Corner. Paper Scrolling & Downloading Web Results. Ming C. Lee, Trilogy Consulting, Denver, CO. Abstract.

Coders' Corner. Paper Scrolling & Downloading Web Results. Ming C. Lee, Trilogy Consulting, Denver, CO. Abstract. Paper 71-25 Scrolling & Downloading Web Results Ming C. Lee, Trilogy Consulting, Denver, CO Abstract Since the inception of the INTERNET and Web Browsers, the need for speedy information to make split

More information

Statistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment

Statistics and Data Analysis. Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment Common Pitfalls in SAS Statistical Analysis Macros in a Mass Production Environment Huei-Ling Chen, Merck & Co., Inc., Rahway, NJ Aiming Yang, Merck & Co., Inc., Rahway, NJ ABSTRACT Four pitfalls are commonly

More information

rpms: An R Package for Modeling Survey Data with Regression Trees

rpms: An R Package for Modeling Survey Data with Regression Trees rpms: An R Package for Modeling Survey Data with Regression Trees Daniell Toth U.S. Bureau of Labor Statistics Abstract In this article, we introduce the R package, rpms (Recursive Partitioning for Modeling

More information

A Useful Macro for Converting SAS Data sets into SAS Transport Files in Electronic Submissions

A Useful Macro for Converting SAS Data sets into SAS Transport Files in Electronic Submissions Paper FC07 A Useful Macro for Converting SAS Data sets into SAS Transport Files in Electronic Submissions Xingshu Zhu and Shuping Zhang Merck Research Laboratories, Merck & Co., Inc., Blue Bell, PA 19422

More information

Because We Can: Using SAS System Tools to Help Our Less Fortunate Brethren John Cohen, Advanced Data Concepts, LLC, Newark, DE

Because We Can: Using SAS System Tools to Help Our Less Fortunate Brethren John Cohen, Advanced Data Concepts, LLC, Newark, DE SESUG 2015 CC145 Because We Can: Using SAS System Tools to Help Our Less Fortunate Brethren John Cohen, Advanced Data Concepts, LLC, Newark, DE ABSTRACT We may be called upon to provide data to developers

More information

Recognition of Tokens

Recognition of Tokens Recognition of Tokens Lecture 3 Section 3.4 Robb T. Koether Hampden-Sydney College Mon, Jan 19, 2015 Robb T. Koether (Hampden-Sydney College) Recognition of Tokens Mon, Jan 19, 2015 1 / 21 1 A Class of

More information

Two useful macros to nudge SAS to serve you

Two useful macros to nudge SAS to serve you Two useful macros to nudge SAS to serve you David Izrael, Michael P. Battaglia, Abt Associates Inc., Cambridge, MA Abstract This paper offers two macros that augment the power of two SAS procedures: LOGISTIC

More information

Statistical Analysis of MRI Data

Statistical Analysis of MRI Data Statistical Analysis of MRI Data Shelby Cummings August 1, 2012 Abstract Every day, numerous people around the country go under medical testing with the use of MRI technology. Developed in the late twentieth

More information

PharmaSUG Paper AD06

PharmaSUG Paper AD06 PharmaSUG 2012 - Paper AD06 A SAS Tool to Allocate and Randomize Samples to Illumina Microarray Chips Huanying Qin, Baylor Institute of Immunology Research, Dallas, TX Greg Stanek, STEEEP Analytics, Baylor

More information

- 1 - Fig. A5.1 Missing value analysis dialog box

- 1 - Fig. A5.1 Missing value analysis dialog box WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation

More information

A Macro Application on Confidence Intervals for Binominal Proportion

A Macro Application on Confidence Intervals for Binominal Proportion A SAS @ Macro Application on Confidence Intervals for Binominal Proportion Kaijun Zhang Sheng Zhang ABSTRACT: FMD K&L Inc., Fort Washington, Pennsylvanian Confidence Intervals (CI) are very important to

More information

A Simple Time Series Macro Scott Hanson, SVP Risk Management, Bank of America, Calabasas, CA

A Simple Time Series Macro Scott Hanson, SVP Risk Management, Bank of America, Calabasas, CA A Simple Time Series Macro Scott Hanson, SVP Risk Management, Bank of America, Calabasas, CA ABSTRACT One desirable aim within the financial industry is to understand customer behavior over time. Despite

More information

A Combined Encryption Compression Scheme Using Chaotic Maps

A Combined Encryption Compression Scheme Using Chaotic Maps BULGARIAN ACADEMY OF SCIENCES CYBERNETICS AND INFORMATION TECHNOLOGIES Volume 13, No 2 Sofia 2013 Print ISSN: 1311-9702; Online ISSN: 1314-4081 DOI: 10.2478/cait-2013-0016 A Combined Encryption Compression

More information