The Piecewise Regression Model as a Response Modeling Tool

NESUG 7 The Piecewise Regression Model as a Response Modeling Tool Eugene Brusilovskiy University of Pennsylvania Philadelphia, PA Abstract The general problem in response modeling is to identify a response curve and estimate the diminishing returns effect. There are d ifferent approaches to response modeling in SAS with emphasis on caveats (OLS segmented regression, robust regression, neural nets, and nonparametric regression). In this paper, we formulate a new problem statement of response modeling as a concave piecewise approximation of a response curve. We use artificial data to illustrate this approach. As the accuracy of the solution depends on the ratio of signal/(signal+noise), we can obtain the exact solution when the ratio is close to one, or when the knots are known. A three step heuristic approach is suggested for the situation when the data are significantly contaminated and/or the knots are unknown. The approach includes three steps. First, we run a dummy regression with the SAS PROC REG to estimate the parameters of dummy variables. Then, we test for a structural break in the series of estimated parameters from the dummy regression using the Chow test in PROC AUTOREG or PROC MODEL. If a change point is identified, then it is treated as a knot or as a first approximation of a knot coordinates in piecewise regression. Finally, we develop a piecewise concave regression with PROC NLP. Introduction Promotion response modeling (PRM) is a necessary decision support tool in today s highly-competitive market. Since the consequences (cost, for instance) of wrong decisions are increasing, so is the role of the promotion response modeling. PRM is industry-specific: for example, response modeling in the credit card industry is essentially different from that in the pharmaceutical industry. In this paper, we will concentrate on the latter, where a response curve is used for the evaluation of effectiveness of the promotion campaign, sales force allocation, development of the optimal marketing mix, etc. Each of these problems may require its own definition of the response curve. In general, however, a promotion response curve is the result of PRM and could be defined as a mathematical construct (depending on the nature of its application) that relates a promotional effort to its response. In the pharmaceutical industry, the response could be the number of new prescriptions, and the promotion effort could be doctor detailing, controlling for all other promotion efforts. As stated above, an adequate definition of the response curve is very important. Real promotion campaign data are very noisy, and the relationship between promotional efforts and responses to them are very weak. Moreover, it is necessary to take into account the diminishing returns and the monotonically increasing nature of the response curve. In other words, we assume that the higher the promotion effort, the higher the response, until some point where the over-promotion effect may kick in. Nonparametric regression would not be helpful here because the resulting response curve will, as a rule, have multiple maxima and minima. Therefore, we formulate the problem of response modeling as a problem of nonlinear optimization with linear and nonlinear constraints. To be specific, we have to find a concave piecewise linear approximation of the relationship between the response and promotion efforts. Concavity and monotonicity are necessary to reflect the diminishing returns in the resulting response curve. The piecewise linearity condition has to hold for the sake of simplicity of the next steps of the decision support, that requires optimization. Problem Statement To make things easier, we will consider only three pieces in the concave piecewise linear approximation of the response curve (see Graph ). Many authors have considered the problem of piecewise linear approximation (for example, see (), We know that a product is over-promoted when the marginal response becomes negative.

NESUG 7 (3)). In this paper, we impose the concavity restriction that is not present in other works; moreover, and we want to solve this problem in SAS. Let Y be the response, X be the promotion effort, and S and S be the first and second knots, respectively. Then, based on promotional data, we need to find the set of unknown parameters B 0, B, B, B 3, S, S that minimize the objective function (the sum of squared residuals): () Y Y Y = B = B = B 0 0 0 + B + B S + B S X, + B ( X + B ( S S ), S ) + B 3 ( X S ), X S S < X S X > S Where B > 0, B > 0, B B > 0, B B 3 > 0, S > 0, and S S > 0. This formulation assumes that B 3 could be negative. If this is the case, the response goes down as promotion efforts go up this is the over-promotion effect we mentioned above. If we believe that there is no over-promotion effect, then we have to impose an additional restriction on B 3 to be nonnegative. According to the definition of the response curve Y, it is continuous, but not differentiable at the knots. Mathematically, the problem of finding the response curve Y, formulated above, is that of nonlinear programming with a continuous but non-differentiable objective function and linear and nonlinear constraints. This type of problem could be solved by the SAS/OR Non-Linear Programming procedure (PROC NLP). PROC NLP has different optimization algorithms. However, we can use only one of them, namely Nelder-Mead Simplex Method (NMSIMP). It is the only algorithm that doesn t require first and second order derivatives, and allows for boundary constraints, linear constraints and nonlinear constraints. (p. 49, ()) Caveats and the Solution The formulation of the problem of finding the concave piecewise linear response curve seems very simple, however, this is just an illusion. Even when the data don t contain any noise, the estimation of the concave nonlinear piecewise response curve does not always lead to a precise solution. Let s consider the following example, where we assume that our response curve consists of three linear pieces. () Y Y Y = = = A0 + A X, A0 + AT + A ( X T ), A + A T + A ( T T ) + A ( X 0 3 T ), X T T < X X > T T Where A 0 =0, A =4, A =, A 3 = -, T =8, T =6. Substituting, we get: (3) Y Y Y = 4X, = 4 8 + ( X 8) = 4 + X, = 4 8 + (6 8) ( X 6) = 56 X, X 8 8 < X 6 X > 6 Our goal is to use PROC NLP (see code below) to find estimates B 0, B, B, B 3, S, S of parameters A 0, A, A, A 3, T, T based only on the data for X and Y generated by () using the program in Appendix. As can be seen from the code, we consider the situation without the over-promotion effect: A 3 =0.>0. First, the PROC NLP code did not include any initial values for parameters. In this situation, PROC NLP automatically assigns random initial values, and the results of the parameter estimation vary according to those initial values. If these randomly assigned initial values are not close to the actual values, PROC NLP is not able to find the exact solution. The code below

NESUG 7 PROC NLP data=data_no_noise OUTEST=STATS TECH=NMSIMP MAXFUNC=500000; PARAMETERS B0, B, B, B3, S, S; * PARAMETERS B0, B, B, B3, S=5, S=9; MIN F; IF X <= S THEN DO; YY=B0 + B*X; F=(YY-Y)**; ELSE IF X <= S THEN DO; YY=B0 + B*S + B*(X-S); F=(YY-Y)**; ELSE DO; YY=B0 + B*S + B*(S-S) + B3*(X - S); F=(YY-Y)**; LINCON S > 3, S > S+3, B > 0, B > 0, B - B > 0, B- B3 > 0; NLINCON B3>=0; /****NO OVERPROMOTION EFFECT****/ produced very different results every time it was run (out of ten times). The best solution was: PROC NLP: Nonlinear Minimization Optimization Results Parameter Estimates Gradient Objective N Parameter Estimate Function B0-0.6976 0.65593 B 3 B 4.076 3.56570 0.36660 0.306 4 B3 0.604655 0.3573 5 S 6 S 4.95498 9.88096-0.8738 0.05846 Value of Objective Function = 40.468434 where the initial values B 0 = 0.988009007, B = 0.935467, B = 0.4936586565, B 3 = 0.394099066, S = 0.8340534 and S = 0.487640093. When we put plausible initial values for knots S = 5 and S = 9, proc NLP immediately found almost the exact solution: PROC NLP: Nonlinear Minimization Optimization Results Parameter Estimates Gradient Objective N Parameter Estimate Function B0-0.04408-0.0676 B 3 B 4.009843.5494-0.48904-0.05357 4 B3 0.0570-0.05936 5 S 6 S 7.97679 5.96853-0.36654-0.30566 Value of Objective Function = 0.047445 Since the objective function possesses potentially many local minima, and is non-differentiable at a number of points, the strong dependence of the accuracy of the solution on initial values is the general problem of nonlinear optimization. The situation becomes more complex when PROC NLP tries to estimate a response curve without over-promotion effect from the data with over-promotion effect. Even when there is no noise in the data, this is still a very complicated problem.

NESUG 7 In real response data, the Signal Signal + Noise ratio is very small, and thus the problem frequently becomes intractable. Noisy data (see Graph ) were generated by the code in Appendix, with the exact same parameters A 0 =0, A =4, A =.5, A 3 =0., S =8, S =6 as in our example above. Since real parameters are unknown, it s very difficult to evaluate the results, but the dependence of the solution on initial values for noisy data is stronger. Thus, to overcome this problem, we offer a three-step heuristic approach. In the first step, we use dummy regression (PROC REG), in order to estimate the parameters of dummy variables for X= to X=max[X]. These regression parameters are treated as a time series, where instead of the time index, the number of the dummy variable is used. (The code for the dummy regression is in Appendix 3, and the graph of dummy regression parameter estimates as a series is in Graph 3). Secondly, if we have some sort of expert knowledge about the number of the knots and their locations, we can apply the Chow Test (PROC AUTOREG) to test the hypothesis about the breakpoints, i.e., knots. The last step involves estimating the parameters of the piecewise concave response curve from the series data using PROC NLP. Here, we can set the breakpoints from the second step as the initial values. Although the problem of assigning the initial values remains, the optimization problem becomes significantly simpler. In our example, we don t know the optimal number of segments it could be one, two or three. Thus, we need to run PROC NLP three times and then compare the values of the objective function in all three cases with zero, one and two knots. The final response curve will consist of the number of segments that has the smallest objective function. Comparing values of the objective function, the optimal number of segments in our example is. Summary We formulate a new problem statement of response modeling as a concave piecewise approximation of a response curve. We use artificial data to illustrate this approach. As the accuracy of the solution depends on the ratio of Signal Signal + Noise we can obtain the exact solution when the ratio is close to one, or when the knots are known. A three step heuristic approach is suggested for the situation when the data are significantly contaminated and/or the knots are unknown., References. Chapter 5. The NLP Procedure. SAS Institute Inc., SAS/OR User s Guide: Mathematical Programming, Version 8, Cary, NC: SAS Institute, Inc., 999, pp. 369-5.. Lerman, P.M. Fitting Segmented Regression Models by Grid Search. Applied Statistics, Vol. 9, No. (980), pp. 77-84. 3. McGee, Victor E and Willard T. Carleton. Piecewise Regression. Journal of the American Statistical Association, Vol. 65, No. 33 (Sep., 970), 09-4.

NESUG 7 APPENDIX Macro for generation of piecewise linear response curve. %MACRO DDATA_NO_NOISE(A0=,A=,A=,A3=, S=,S=); DATA DATA_NO_NOISE; %DO XX= %TO 5 ; /****ONLY ONE PIECE***/ %IF &S=0 %THEN %DO; Y=%SYSEVALF(&A0 + %SYSEVALF(&A*&XX)); /***TWO PIECES***/ %ELSE %IF &S=0 %THEN %DO; %IF &XX<=&S %THEN %DO; Y=%SYSEVALF(&A0 + %SYSEVALF(&A*&XX)); %ELSE %DO; Y=%SYSEVALF(&A0 + %SYSEVALF(&A*&S) + %SYSEVALF(&A*(&XX-&S))); /****THREE PIECES***/ %ELSE %IF &XX<=&S %THEN %DO; Y=%SYSEVALF(&A0 + %SYSEVALF(&A*&XX)); %ELSE %IF &XX <=&S %THEN %DO; Y=%SYSEVALF(&A0 + %SYSEVALF(&A*&S) + %SYSEVALF(&A*(&XX- &S))); %ELSE %DO; Y=%SYSEVALF(&A0 + %SYSEVALF(&A*&S) + %SYSEVALF(&A*(&S- &S)) + %SYSEVALF(&A3*(&XX-&S))); PROC PRINT DATA=DATA_NO_NOISE; PROC GPLOT DATA=DATA_NO_NOISE; PLOT Y*X; %MEND DDATA_NO_NOISE; %DDATA_NO_NOISE(A0=0,A=4,A=.5,A3=0., S=8,S=6)

NESUG 7 APPENDIX Macro for generation of data with normal noise. %MACRO DDATA_WITH_NORNOISE(A0=,A=,A=,A3=, S=,S=); DATA DATA_WITH_NORNOISE; %DO XX= %TO 5 ; /****ONLY ONE PIECE***/ %IF &S=0 %THEN %DO; DO I= TO 5; YY=%SYSEVALF(&A0 + %SYSEVALF(&A*&XX)); Y=YY + SQRT(5)*RANNOR(345+I*4); /****TWO PIECES***/ %ELSE %IF &S=0 %THEN %DO; DO I= TO 5; %IF &XX<=&S %THEN %DO; YY=%SYSEVALF(&A0 + %SYSEVALF(&A*&XX)); Y=YY + SQRT(5)*RANNOR(345+I*4); %ELSE %DO; YY=%SYSEVALF(&A0 + %SYSEVALF(&A*&S) + %SYSEVALF(&A*(&XX-&S))); Y=YY + SQRT(5)*RANNOR(345+I*00); /****THREE PIECES***/ %ELSE %IF &XX<=&S %THEN %DO; YY=%SYSEVALF(&A0 + %SYSEVALF(&A*&XX)); DO I= TO 5; Y=YY +.8*SQRT(5)*RANNOR(345+I*4); %ELSE %IF &XX <=&S %THEN %DO; YY=%SYSEVALF(&A0 + %SYSEVALF(&A*&S) + %SYSEVALF(&A*(&XX-&S))); DO I= TO 50; Y=YY + 4*SQRT(5)*RANNOR(345+I*00)+ *SQRT(5)*RANNOR(345+I*); %ELSE %DO; YY=%SYSEVALF(&A0 + %SYSEVALF(&A*&S) + %SYSEVALF(&A*(&S-&S)) + %SYSEVALF(&A3*(&XX-&S))); DO I= TO 5; Y=YY + 3*SQRT(5)*RANNOR(345+I*5);

NESUG 7 DATA DATA_WITH_NORNOISE; SET DATA_WITH_NORNOISE (WHERE=(Y >=0)); DROP YY I; PROC PRINT DATA=DATA_WITH_NORNOISE; PROC GPLOT DATA=DATA_WITH_NORNOISE; PLOT Y*X; %MEND DDATA_WITH_NORNOISE; %DDATA_WITH_NORNOISE(A0=0,A=4,A=.5,A3=0., S=8, S=6)

NESUG 7 APPENDIX 3 Dummy Regression. DATA DATA_FOR_DUMMY_REG; /*****CREATION DUMMY VARIABLE FRO EACH VALUE OF PROMOTION EFFORT*****/ ARRAY DUMMY(5) ; SET DATA_WITH_NORNOISE; DO I= TO 5; * IF X = I THEN DUMMY(I)=; * ELSE DUMMY(I)=0; DUMMY(I)=(X=I); DROP I; PROC PRINT DATA=DDD (OBS=50); ODS OUTPUT PARAMETERESTIMATES=PPP; PROC REG DATA=DATA_FOR_DUMMY_REG; /*****DUMMY REGRESSION****/ MODEL Y=DUMMY-DUMMY4; QUIT; DATA PPP; SET PPP; KEEP Variable Estimate ; PROC PRINT DATA=PPP; PROC TRANSPOSE DATA=PPP OUT=TTT PREFIX=COEFF; /***ONE COLUMN PER COEFFICIENT****/ PROC PRINT DATA=TTT; /***COEF=INTERCEPT***/ DATA COEFS; /****COEFFICIENTS ADJUSTMENT****/ ARRAY PARMS(4) PARMS-PARMS4; ARRAY COEFF(5) COEFF-COEFF5; SET TTT; PARMS0=COEFF; DO I= TO 4; PARMS(I)=PARMS0+COEFF(I+); DROP COEFF-COEFF5 I _NAME LABEL_ PARMS0; PROC PRINT DATA=COEFS; PROC TRANSPOSE DATA=COEFS OUT=SERIES; /****COEFFICIENTS AS "TIME SERIES"****/ PROC PRINT DATA=SERIES; DATA SERIES; SET SERIES (RENAME=(COL=Y)); X+; DROP _NAME_; PROC PRINT DATA=SERIES; PROC GPLOT DATA=SERIES; PLOT Y*X;

NESUG 7 Graph : The Three-piece response curve The x-axis is the promotion efforts and the y-axis is the response. Graph - Simulated data similar to the real world data The x-axis is the promotion efforts and the y-axis is the response

NESUG 7 Graph 3 - Series of dummy regression parameter estimates The x-axis is the promotion efforts and the y-axis is the response