PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA

Similar documents
Speed Dating: Looping Through a Table Using Dates

Withdrawn Equity Offerings: Event Study and Cross-Sectional Regression Analysis Using Eventus Software

Useful Tips When Deploying SAS Code in a Production Environment

What you learned so far. Loops & Arrays efficiency for statements while statements. Assignment Plan. Required Reading. Objective 2/3/2018

Paper SAS Managing Large Data with SAS Dynamic Cluster Table Transactions Guy Simpson, SAS Institute Inc., Cary, NC

SAS Scalable Performance Data Server 4.3

SAS System Powers Web Measurement Solution at U S WEST

Contents of SAS Programming Techniques

RLMYPRINT.COM 30-DAY FREE NO-OBLIGATION TRIAL OF RANDOM LENGTHS MY PRINT.

A Simple Time Series Macro Scott Hanson, SVP Risk Management, Bank of America, Calabasas, CA

Transforming SAS code into a SAS Macro using PERL Sumner H. Williams, CareOregon, Portland, OR, USA

Penetrating the Matrix Justin Z. Smith, William Gui Zupko II, U.S. Census Bureau, Suitland, MD

Eventus Example Series Using Non-CRSP Data in Eventus 7 1

DATE OF BIRTH SORTING (DBSORT)

Nigerian Telecommunications (Services) Sector Report Q2 2016

Using PROC SQL to Calculate FIRSTOBS David C. Tabano, Kaiser Permanente, Denver, CO

Tweaking your tables: Suppressing superfluous subtotals in PROC TABULATE

Make it a Date! Setting up a Master Date View in SAS

Paper Leads and Lags: Static and Dynamic Queues in the SAS DATA STEP, 2 nd ed. Mark Keintz, Wharton Research Data Services

CSE 341 Section Handout #6 Cheat Sheet

Bad Date: How to find true love with Partial Dates! Namrata Pokhrel, Accenture Life Sciences, Berwyn, PA

Effects of PROC EXPAND Data Interpolation on Time Series Modeling When the Data are Volatile or Complex

Combining Contiguous Events and Calculating Duration in Kaplan-Meier Analysis Using a Single Data Step

INFORMATION TECHNOLOGY SPREADSHEETS. Part 1

CMIS 102 Hands-On Lab

Conditional Formatting

SD10 A SAS MACRO FOR PERFORMING BACKWARD SELECTION IN PROC SURVEYREG

A Format to Make the _TYPE_ Field of PROC MEANS Easier to Interpret Matt Pettis, Thomson West, Eagan, MN

LIBNAME CCC "H:\Papers\TradeCycle _replication"; /*This is the folder where I put all the three data sets.*/ RUN;

Database Programming with SQL

LOADS, CUSTOMERS AND REVENUE

INFORMS Transactions on Education

Different Methods for Accessing Non-SAS Data to Build and Incrementally Update That Data Warehouse

Asia Key Economic and Financial Indicators

SCI - NIH/NCRR Site. Web Log Analysis Yearly Report Report Range: 01/01/ :00:00-12/31/ :59:59.

Know What You Are Missing: How to Catalogue and Manage Missing Pieces of Historical Data

APPENDIX E2 ADMINISTRATIVE DATA RECORD #2

%MAKE_IT_COUNT: An Example Macro for Dynamic Table Programming Britney Gilbert, Juniper Tree Consulting, Porter, Oklahoma

DATA Step Debugger APPENDIX 3

CHAPTER 4 STOCK PRICE PREDICTION USING MODIFIED K-NEAREST NEIGHBOR (MKNN) ALGORITHM

Title. Description. Quick start. Menu. stata.com. import haver Import data from Haver Analytics databases

Title stata.com import haver Syntax

User Guide for the WegenerNet Data Portal

Nigerian Telecommunications (Services) Sector Report Q3 2016

Programming Languages. Function-Closure Idioms. Adapted from Dan Grossman's PL class, U. of Washington

EXAMPLE 3: MATCHING DATA FROM RESPONDENTS AT 2 OR MORE WAVES (LONG FORMAT)

%MISSING: A SAS Macro to Report Missing Value Percentages for a Multi-Year Multi-File Information System

CS450 - Structure of Higher Level Languages

Nigerian Telecommunications Sector

Checking for Duplicates Wendi L. Wright

Paper DB2 table. For a simple read of a table, SQL and DATA step operate with similar efficiency.

Get Going with PROC SQL Richard Severino, Convergence CT, Honolulu, HI

... ) city (city, cntyid, area, pop,.. )

Sand Pit Utilization

PIVOT = Crosstabs, SQL Style

Base and Advance SAS

libname learn "C:\sas\STAT6250\Examples"; /*Identifies library of data*/

Out of Control! A SAS Macro to Recalculate QC Statistics

App Economy Market analysis for Economic Development

Tracking Dataset Dependencies in Clinical Trials Reporting

Polycom Advantage Service Endpoint Utilization Report

Excel Functions & Tables

Getting the Right DATES

CS Programming I: Arrays

Polycom Advantage Service Endpoint Utilization Report

UC DAVIS THERMAL ENERGY STORAGE (TES) TANK OPTIMIZATION INVESTIGATION MATTHEW KALLERUD, DANNY NIP, MIANFENG ZHANG TTP289A JUNE 2012

NMOSE GPCD CALCULATOR

Are Your SAS Programs Running You? Marje Fecht, Prowerk Consulting, Cape Coral, FL Larry Stewart, SAS Institute Inc., Cary, NC

Arthur L. Carpenter California Occidental Consultants, Oceanside, California

Submitting SAS Code On The Side

To conceptualize the process, the table below shows the highly correlated covariates in descending order of their R statistic.

Using Templates Created by the SAS/STAT Procedures

Imputation for missing data through artificial intelligence 1

All About SAS Dates. Marje Fecht Senior Partner, Prowerk Consulting. Copyright 2017 Prowerk Consulting

Obtaining and Managing IP Addresses. Xavier Le Bris IP Resource Analyst - Trainer

San Joaquin County Emergency Medical Services Agency

Lecture 16. Reading: Weiss Ch. 5 CSE 100, UCSD: LEC 16. Page 1 of 40

Asks for clarification of whether a GOP must communicate to a TOP that a generator is in manual mode (no AVR) during start up or shut down.

Grade 4 Mathematics Pacing Guide

Efficiently Join a SAS Data Set with External Database Tables

Repetition Through Recursion

Tracking the Internet s BGP Table

Are Your SAS Programs Running You?

The Vision Council Winds of Change

My SAS Grid Scheduler

The Proc Transpose Cookbook

COURSE LISTING. Courses Listed. Training for Database & Technology with Modeling in SAP HANA. 20 November 2017 (12:10 GMT) Beginner.

Week - 04 Lecture - 01 Merge Sort. (Refer Slide Time: 00:02)

Using Different Methods for Accessing Non-SAS Data to Build and Incrementally Update That Data Warehouse

Paper CC-016. METHODOLOGY Suppose the data structure with m missing values for the row indices i=n-m+1,,n can be re-expressed by

Key Terms and Concepts. Introduction

Quarterly Sales (in millions) FY 16 FY 15 FY 14 Q1 $706.8 $731.1 $678.5 Q Q Q

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

EVM & Project Controls Applied to Manufacturing

CIMA Asia. Interactive Timetable Live Online

Contents. Generating data with DO loops Processing variables with arrays

A Simple Framework for Sequentially Processing Hierarchical Data Sets for Large Surveys

Gerard Pauline and Walter Morris Pace University

Square Peg, Square Hole Getting Tables to Fit on Slides in the ODS Destination for PowerPoint

Why SAS Programmers Should Learn Python Too

Transcription:

PROGRAMMING ROLLING REGRESSIONS IN SAS MICHAEL D. BOLDIN, UNIVERSITY OF PENNSYLVANIA, PHILADELPHIA, PA ABSTRACT SAS does not have an option for PROC REG (or any of its other equation estimation procedures) that will compute either recursive least squares or rolling regressions i.e., econometric procedures in which the same linear equation is estimated multiple times using either a growing sample or partially overlapping subsamples. This paper shows how a macro subroutine can achieve the desired effect, and discusses possible extensions of this approach. INTRODUCTION Many data explorations and research projects estimate the same basic equation or relationship among variables over multiple date ranges. Rolling regressions are an example of an econometric procedure that belongs to this category. In a rolling regression, least-squares techniques are used to fit a linear equation (and estimate the corresponding coefficients) multiple times using partially overlapping subsamples (from a larger set). This procedure is typically applied to time series data in a manner that keeps the sample length fixed for each estimation step by increasing the beginning and ending dates by the same time or date increment. A related technique that keeps the starting date fixed is known as recursive least squares. In both cases, it is the overlapping sample periods that distinguishes these cases from a case where a particular equation and its regression coefficients are estimated multiple times over distinct (non-overlapping) periods. Unfortunately, SAS does not have a simple option that can added to PROC REG or any of its other model or equation estimation procedures to run rolling regressions (and the related variants, such as recursive least squares). Because of the repetitive nature of the calculations, however, a SAS macro is well suited to the task especially a macro routine that makes use of SAS date values and date functions to define the estimation windows that define each subs ample. SAS macro code (%ROLLINGREG) and full-functioning program are presented below, where a loop nests a PROC REG step with a BY identifier statement. This feature adds a cross-sectional element to the time-series analysis and simplifies the programming of cases that would otherwise require setting up distinct rolling regression for each cross-section of a data set. Both simpler and more complicated equations and models can be handled using the same approach. Most important, the example code saves the estimated coefficients and other results from each subsample to a permanent dataset for further analysis. The example presented below is set up for a fixed 36-month estimation window that increments forward onemonth for each iteration of the loop until it reaches the end of a full sample period. If a 36-month window is not desired, it is relatively easy change the length of the estimation window. The macro also accommodates flexibility in the data frequency, such that dates that cover a day, week or a year can be used. In addition, it is relatively straightforward to allow for different types of overlapping date ranges, such as the type needed for recursive least squares. In sum, the macro subroutine provides a very general and useful starting point for applying rolling regression techniques and the related variants in SAS. ROLLING SAMPLE PERIOD DESIGNS The table below helps to conceptualize the loops and windows for a rolling sample case that uses monthly data that runs from January to December. Here the columns represent each loop or iteration, which defines a particular window or subsample. In this example, each subsample is 3 months in length and the windows move ahead 1 month per loop, such that 10 loops that each cover a different 3 month range are run. 1

Fig. 1.0 ROLLING SAMPLE EXAMPLE Loop 1 Loop 2 Loop 3 Loops 4 to 8 Loop 9 Loop 10 Jan Feb Mar Feb Mar Apr Mar Apr May Details not shown Sep Oct Nov Oct Nov Dec It is fairly easy to imagine a case where the full period covers more than one year and where the common periods in two adjacent periods are more than two months in length (such that the inner period is more than one month). In addition, daily data could be used in the design above, where the first date in each subsample is the first day in each month and the last date in each subsample is the last day in the last month. There also are cases where it is both possible and desirable to increment forward more than a one-month or one-period in each loop step, or to keep the first date in each sub sample fixed and have the sample lengths grow in size as the loop progresses. USING DATE VARIABLES IN PROC REG WITH A BY STATEMENT A preliminary example will help in understanding the full example of a rolling regression. Consider a dataset that has both cross-sectional and time-series aspects, and the data structure allows for a BY variable in PROC REG. In particular, assume that the BY variable allows one to use PROC REG and easily estimate OLS coefficients for 2

each distinct cross-sectional group (where the variable values identify a person, geographical region, company, or any other group concept). For example, a dataset named DSET1 might have the following structure of rows and columns: ID, DATE Y, X1, X2. In this case, the observations have an explicit cross-sectional identifier (ID). The time-series aspect is the DATE variable that could be daily, monthly, quarterly or annual in frequency. in this example, Y is the dependent variable and X1 and X2 are the potential explanatory variables. Most important, it is assumed that the data is sorted or can be sorted by ID and DATE, such that within each ID block, the observations are in DATE order. The ID dimension in the assumed data format makes SAS a good choice for these types of estimation problems. As long as the data is sorted in ID and DATE order, the PROC REG can be used to estimate coefficients for each ID. proc reg noprint data=dset1 outest=regout1 edf ; where date between 01JAN d and 31DEC2002 d; model y = x1 x2; by id; The code above places the OLS coefficients, R2, and a few other statistics and identifiers for a two-year period in an output dataset (OUTEST=REGOUT1 in this case). Suppose DSET1 has monthly data for 5 ID s, then the results might be as shown in Figure 1.0. Fig. 2.0 PROC REG OUTEST= RESULTS ID DEPVAR RMSE Intercept x1 x2 P EDF RSQ 10107 y 0.008-0.008 0.895 1.327 3 249 0.417 11081 y 0.011-0.008 0.935 1.269 3 249 0.324 12490 y 0.007-0.007 0.809 2.264 3 249 0.409 14593 y 0.020-0.014 1.366 0.776 3 249 0.387 14656 y 0.016 0.006 0.828-0.313 3 249 0.128 The basic task of the rolling regression macro is to change the date range to cover different periods in an iterative or looping fashion and to place (append) the results from each loop in a single dataset that holds the OLS statistics for each identifier and date range. In other words, a statement such as where date between &date1 and &date2 is needed, with the macro routine defining &date1 and &date2 for each window. The macro routine should also accept input arguments that include the input and output data set names, the regression model equation specification, and the identifier variables. ROLLING REGRESSION MACRO To put the ideas above into practice, an outline of a block of macro code is given below: %let date2 = window end point in first loop ; %do %while(&date2 <= &end_date); ** (a) define &date1 as window start date ; ** (b) create data subsample using where date between &date1 and &date2 ; ** (c) use PROC REG on the subsample; ** (d) process and save the results; ** (e) increment date2 forward for the next loop or end the iterations; 3

Here %do %while ( ) --- %end starts and controls the loops. Different part of sub-steps inside the loop can be combined or accomplished in a slightly different order, with (b), (c) and (d) defining the data and statistical functions, while substeps (a), (e) and the initial setting of &date2 hold the key to running the correct loops. There are various ways to set date1 and date2, and to handle different types of loops (and thus, handle different types of rolling regressions). In all cases it is important to recognize the inter-relations of the dates and frequencies as displayed in Figure 1 above. My application makes extensive use of the SAS function INTNX that can increment a date forward or backward a set number of periods, find the proper last day in a month, quarter, or year, and even recognize the effects of leap years. The appendix presents the full macro as %ROLLINGREG with input arguments that include the identifier for the main date variable in the data set and also parameters that define the start and end dates for the entire analysis and for each loop. The macro is called as %rollingreg(data= input1, out_ds= input2, model_equation= input3, id= input4, date= input5, start_date= input6, end_date= input7, freq= input8, s= input9, n= input10, regprint= input11); For the macro s arguments, data, out_ds, and model_equation are fairly self-explanatory and are the only required inputs. In defining this macro, a named argument style is used, as opposed to positional style that has a fixed set of inputs, so that some parameters can take default values and the order of the inputs is not important. Here, the data and out designations will typically be two-level names such as data=mylib.md and out_ds=mylib.mregout. To specify the model equation, an example might be model_equation= y =x1 x2, where two equal signs are intentional. The first equal sign separates the argument name from the equation and the second equal sign separates the left-hand side from the right-hand side. An equation such as above will assume that an intercept is desired. To exclude the intercept add / noint. In fact, any addition options that are valid in model statement for PROC REG can be used. The id argument defines the cross-sectional element for the BY statement in PROC REG. If id it is excluded when invoking the macro, it will be assumed that all dates are grouped together (i.e, just a time-series dimension will be used). For the dates that define the windows of each loop, date is the default or assumed date identifier in the input set. This can be changed by simply identifying the date variable name, e.g., date=xdate. If date=year or if the date variable starts or ends with year (such as date=fy_year and date=yearx), then the variables values must be a 4 digit year. In all other cases, the date variable must be a SAS date, which is actually a numeric system that is oriented around January 1, 1960 as day 0. These SAS dates are most useful because they allow for different frequency intervals of a day, week, month, quarter or a year to define the loops such that each regression period will be based on date range endpoints that are S periods apart and with sample periods that are N periods long. It is important to know that start_date and end_date specify the date range for the entire analysis, such that start_date is the first observation used in the date range of the first loop and end_date is the last observation used in the last loop. If these two date parameters are not set, the macro will use the entire date range in the data set (technically, the first or minimum and last or maximum date numbers will be determined and used). Valid formats for the date parameters are 01JAN2004, 1-1-20004, 1/1//2004, JAN2004, and 2004. The macro code will convert any of these formats to the proper date number such that a 4 digit year becomes the number for January 1 of the year when used with start_date and becomes the number for December 31 of the year when used with end_date. Similarly, a month-year arguments become the date number for the first and last day of the month for start_date and end_date, respectively. The parameters N, S and freq define the interval between the end of each sample period and the length of each sample period. The default looping frequency is months (i.e., monthly) such that all date counting to define the loops is based on months, even if the underlying data in the analysis is daily. A setting of freq =daily is permitted, as is freq=year and freq=quarter. The default values of N=1 and S=12 will set each loop period as 12 months (same as shown in the table above) and will iterate forward one month at a time. To iterate forward one year at a 4

time and to use 24 months as each loop sample length, you can use freq=month, N=12, and S=24 or equivalently freq=year, N=1, and S=2. EXAMPLE RESULTS The appendix has an example that uses %ROLLINGREG to compute betas for a set of stocks, where this coefficient measures the sensitivity of a company s stock to the overall stock market. Figure 3.0 is an example of the results, with each PERMNO block representing a cross-section for the stocks in the sample. Fig. 3.0 ROLLING REGRESSION OUTPUT PERMNO=10107 date1 date2 RMSE Intercept beta regobs 01JAN2002 31DEC2004 0.057470 -.005836843 0.97255 36 01FEB2002 31JAN 0.057446 -.004874504 0.95958 36 01MAR2002 28FEB 0.057358 -.004550887 0.92014 36 01APR2002 31MAR 0.057448 -.005048308 0.92846 36 01MAY2002 30APR 0.056784 -.000042066 0.81758 36 01JUN2002 31MAY 0.056721 0.000278585 0.80891 36 01JUL2002 30JUN 0.051923 -.006355348 0.99118 36 01AUG2002 31JUL 0.051481 -.004585000 0.91877 36 01SEP2002 31AUG 0.053227 -.002505193 0.89010 36 01OCT2002 30SEP 0.054340 -.003183394 0.84427 36 01NOV2002 31OCT 0.045522 -.003110266 0.52825 36 01DEC2002 30NOV 0.045962 -.002508023 0.50723 36 01JAN2003 31DEC 0.044918 0.000198639 0.35030 36 PERMNO=11081 date1 date2 RMSE Intercept beta regobs 01JAN2002 31DEC2004 0.055252 0.008974707 0.94293 36 01FEB2002 31JAN 0.055182 0.008669580 0.94526 36 01MAR2002 28FEB 0.054267 0.009601962 0.88876 36 01APR2002 31MAR 0.054588 0.008332045 0.89608 36 01MAY2002 30APR 0.055504 0.004430406 0.97562 36 [additional lines not shown] In the output above, date1 and date2 show the date range for the sample that corresponds to the estimates in each row. Both dates are intentionally shown because it is tempting to associate the estimated coefficients only with the last date for each window (date2) when it is better to think of the coefficients representing the average effect in the entire date1-date2 period. Note also that a count of regression observations (regobs) is included, helping to determine cases where a block has missing observations. 5

EXTENSIONS AND CAUTIONS For both simple and complicated regression equations, %ROLLINGREG can handle different window and looping parameters and estimate numerous types of rolling sample periods. As explained above, it is possible to set the frequency of the date periods that define each loop to be greater than the frequency of the data, such that the windows step forward more than one period per loop. In such cases, it is both prudent and useful to review the output for the date1 and date2 variables put to check that the desired date intervals and effects are being captured. More extensive changes can be made by changing the PROC REG step in the macro to use another statistical procedure in SAS. Very simple statistics such as moving averages can be computed using PROC MEANS or PROC SUMMARY in the same basic steps, however, PROC EXPAND or a single pass in DATA STEP is almost always a better choice. Using PROC GLM for a panel estimation or PROC MODEL for a system of possible nonlinear equations in %ROLLINGREG would be reasonable, however. In fact, It is possible to generalize the macro to have an extra input argument such as procedure= %proc_macro argument that would call another macro that defines the statistical procedure. In applying this idea, it is important to be careful about the scope of the macro variables and input arguments that are passed through or is shared by the different macro routines. Because of this issue, it might be easier to modify %ROLLINGREG directly (replacing the PROC REG block with code for another procedure) and give the modified macro a new name such as %ROLLING_MODELNAME. Finally, as a caution in interpreting the results, it is important to recognize that although the estimated coefficients of a rolling regression change over time, they are not time-varying parameters in the classical sense. SAS has procedures to compute such models and rolling regression results are better thought of as a type of specification test or robustness check for determining whether the parameters of an equation are stable over time. In other words, one use of rolling regressions is to determine whether a true time-varying parameter model is more appropriate than a model that assumes fixed parameters. CONCLUSIONS Because SAS macro code can handle various date functions, a macro subroutine is a good choice for creating loops to produce rolling regression results. %ROLLINGREG is a flexible example of this approach. The basic case generalizes PROC REG to iterate forward one date period from a set starting point to the end of the sample. It is also possible to set the frequency of the loop to be greater than the frequency of the data. In other words, when using %ROLLINGREG with daily data, you do not need to iterate forward one day at each step. Monthly, quarterly or annual loop intervals are also accommodated. In addition, the design of the macro can accommodate other equation estimation procedures such as PROC GLM and PROC MODEL. ACKNOWLEDGMENTS SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration. CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Michael D. Boldin Wharton Research Data Services The Wharton School, The University of Pennsylvania 216 Vance Hall 3733 Spruce Street Philadelphia PA 19104-6301 boldinm@wharton.upenn.edu * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 6

* ROLLINGREG - Rolling Regresson Macro (v1.2); * M Boldin July 2007; Rolling regression: least-squares equation is estimated multiple times using partially overlapping subsamples from a larger set. This application keeps the sample length fixed and increases the beginning and ending dates by a particular 'date' increment. OLS coefficients from each iteration or loop are saved in a output set. The dataset for the application can have both cross-sectional and time-series aspects that allow a 'BY' variable in PROC REG to estimate OLS coefficients (by company or stock). The macro routine uses 'named' input arguments: data = input set out_ds = output set for saving results, such as OLS coefficients. model_equation = valid model statement for PROC REG id = cross-sectional identifier (default: none, pure time series) date = date variable name (default: date) start_date= first date in analysis (default: first date in data) end_date= last date in analysis (default: last date in data) freq= frequency of loop interval (default: month, not necessary the same as data dates) S= frequency periods for moving loop end date forward (default: 1, 1-month) N= length of sample period in term of freq (default: 12, 12-months) regprint= use yes to show regression output (default: noprint) Only first three inputs (data, out_ds, and model_equation) are required. The remainder have default settings. The data= and out_ds= designations may be two-level names such as data=mylib.md and out_ds=mylib.mregout. Date is the default or assumed date identifier in the input set. If date=year or starts with 'year' (such as 7

date=yeara), then the other date variables values must be a 4 digit year. In all other cases, the date variable must be a SAS date, which is a numeric system that is oriented around January 1, 1960 as day 0. Valid formats for the date parameters are 01JAN2004, 1-1-20004, 1/1//2004, JAN2004, and 2004. The parameters N and S and freq define the interval between the end of each sample period and the length of each sample period. The default looping frequency is months (i.e., monthly) such that all date counting to define the loops is based on months, even if the underlying data in the analysis is daily. To iterate forward one year at a time and to use 24 months as each loop sample length, you can use freq= month, N=12, and S=24 or equivalently freq= year, N=1, and S=2. Output example where ID=PERMNO and N=36 (months) permno date1 date2 _RMSE_ Intercept VWRETD regobs 10107 01JAN2002 31DEC2004 0.057470 -.005836843 0.97255 36 10107 01FEB2002 31JAN 0.057446 -.004874504 0.95958 36 10107 01MAR2002 28FEB 0.057358 -.004550887 0.92014 36 In the output, 'date1' and 'date2' show the date range for the sample that corresponds to the estimates that are shown in each row. Regobs is a count of regression observations. *********************************/ %macro rollingreg ( data=, out_ds=, model_equation=, id=, date=date, start_date=, end_date=, freq=month, s=1, n=12, regprint=noprint ); %* Start with empty output data sets; proc datasets nolist; delete _all_ds _outest_ds; * Prepare input data for by-id-date use; proc sort data=&data; by &id &date; %* Set the 'by-id' variable; 8

%let by_id= ; *blank default, no by variable; %if %length(&id) > 0 %then %let by_id= by &id; %* Determine date range variables; %if %lowcase(%substr(&date,1,4))= year %then %let year_date=1; %else %let year_date=0; %let sdate1 = &start_date; %let sdate2 = &end_date; %* Make start and end date if missing; %if &start_date = %str() &end_date = %str() %then %do; proc sql noprint; create table _dx1 as select min(&date) as min_date, max(&date) as max_date from &data where not missing(&date); select min_date into : min_date from _dx1; select max_date into : max_date from _dx1; quit; %* SDATE1 and SDATE2 put in sas date number form (1/1/1960=0); %if &sdate1 = %str() %then %do; %let sdate1= &min_date; %else %do; %if (%index(&sdate1,%str(-)) > 1) (%index(&sdate1,%str(/)) > 1) %then %let sdate1= %sysfunc(inputn(&sdate1,mmddyy10.)); %else %if ( %length(&sdate1)=7 ) %then %let sdate1= %sysfunc(inputn(01&sdate1,date9.)); %else %if ( %length(&sdate1)=8 %length(&sdate1)=9 ) %then %let sdate1= %sysfunc(inputn(&sdate1,date9.)); %else %if ( %length(&sdate1)=4 ) %then %let sdate1= %sysfunc(inputn(01jan&sdate1,date9.)); %if &year_date=1 %then %let sdate1=%sysfunc(year(&sdate1)); %if &sdate2 = %str() %then %do; %let sdate2= &max_date; %else %do; %if (%index(&sdate2,%str(-)) > 1) (%index(&sdate2,%str(/)) > 1) %then %let sdate2= %sysfunc(inputn(&sdate2,mmddyy10.)); %else %if ( %length(&sdate2)=7 ) %then %do; %let sdate2= %sysfunc(inputn(01&sdate2,date9.)); %let sdate2= %sysfunc(intnx(month,&sdate2,0,end)); %else %if ( %length(&sdate2)=8 %length(&sdate2)=9 ) %then %let sdate2= %sysfunc(inputn(&sdate2,date9.)); %else %if ( %length(&sdate2)=4 ) %then %let sdate2= %sysfunc(inputn(31dec&sdate2,date9.)); %if &year_date=1 %then %let sdate2=%sysfunc(year(&sdate2)); %*Determine loop frequency parameters; %if %eval(&n)= 0 %then %let n= &s; %* if n blank use 1 period (=&s) assumption; 9

%if &year_date=1 %then %let freq=year; %* year frequency case; %put Date variable: &date year_date: &year_date; %put Start and end dates: &start_date &end_date // &sdate1 &sdate2; %if &year_date=0 %then %put %sysfunc(putn(&sdate1,date9.)) %sysfunc(putn(&sdate2,date9.)); %put Freq: &freq s: &s n: &n; %* Preliminary date setting for each iteration/loop; %* First end date (idate2) is n periods after the start date; %if &year_date=1 %then %let idate2= %eval(&sdate1+(&n-1)); %else %let idate2= %sysfunc(intnx(&freq,&sdate1,(&n-1),end)); %if &year_date=0 %then %let idate1= %sysfunc(intnx(&freq,&idate2,-&n+1,begin)); %else %let idate1= %eval(&idate2-&n+1); %put First loop: &idate1 -- &idate2; %put Loop through: &sdate2; %if (&idate2 > &sdate2) %then %do; %* Dates are not acceptable-- show problem, do not run loop; %put PROBLEM-- end date for loop exceeds range : ( &idate2 > &sdate2 ); %else %do; *Dates are accepted-- run loops; %let jj=0; %do %while(&idate2 <= &sdate2); %let jj=%eval(&jj+1); %*Define loop start date (idate1) based on inherited end date (idate2); %if &year_date=0 %then %do; %let idate1= %sysfunc(intnx(&freq,&idate2,-&n+1,begin)); %let date1c= %sysfunc(putn(&idate1,date9.)); %let date2c= %sysfunc(putn(&idate2,date9.)); %if &year_date=1 %then %do; %let idate1= %eval(&idate2-&n+1); %let date1c= &idate1; %let date2c= &idate2; %let idate1= %sysfunc(max(&sdate1,&idate1)); %put Loop: &jj -- &date1c &date2c; %put &jj -- &idate1 &idate2; proc datasets nolist; delete _outest_ds; %***** analysis code here -- for each loop; %* noprint to just make output set; %let noprint= noprint; %if %upcase( print) = yes %upcase( print) = print %then %let noprint= ; proc reg data=&data 10

outest=_outest_ds edf &noprint; where &date between &idate1 and &idate2; model &model_equation; &by_id; %* Add loop date range variables to output set; data _outest_ds; set _outest_ds; regobs= _p_ + _edf_; %* number of observations in regression; date1= &idate1; date2= &idate2; %if &year_date=0 %then format date1 date2 date9.; %* Append results; proc datasets nolist; append base=_all_ds data=_outest_ds; %* Set next loop end date; %if &year_date=0 %then %let idate2= %sysfunc(intnx(&freq,&idate2,&s,end)); %else %if &year_date=1 %then %let idate2= %eval(&idate2+&s); *% end of loop; %* Save outout set to desired location; data &out_ds; set _all_ds; proc sort data=&out_ds; by &id date2; %* end for date check pass section; %mend; *************************************************** * Run Rolling Regression using CRSP Monthly Stock data; %include rollingreg.sas ; libname proj. ; *Prepare stock return data set for selected PERMNOs; data proj.m1; set crsp.msf (keep=permno date ret prc vol shrout); where permno in (10107, 11081, 12490, 14593, 14656) and year(date) >= ; *Add market return to each PERMNO block; proc sql; create table proj.m1 as 11

select a.*, b.vwretd from proj.m1 as a left join crsp.msi as b on a.date=b.date order by a.permno, a.date; quit; * Call macro; %rollingreg( data=proj.m1, out_ds=proj.rr1, id=permno, date=date, model_equation= ret= vwretd, start_date= 1-1-2002, end_date= 12-31-, freq=month, s=1, n=36); * Show results: proc print data=proj.rr1 (rename= (vwretd=beta)); by permno; id date1 date2; var rmse intercept beta regobs; 12