The Piecewise Regression Model as a Response Modeling Tool

Similar documents
Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Centering and Interactions: The Training Data

An Optimal Regression Algorithm for Piecewise Functions Expressed as Object-Oriented Programs

Sandeep Kharidhi and WenSui Liu ChoicePoint Precision Marketing

Multiple Regression White paper

Introduction to Statistical Analyses in SAS

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

Introduction to ANSYS DesignXplorer

piecewise ginireg 1 Piecewise Gini Regressions in Stata Jan Ditzen 1 Shlomo Yitzhaki 2 September 8, 2017

Recent advances in Metamodel of Optimal Prognosis. Lectures. Thomas Most & Johannes Will

3 Nonlinear Regression

The Fly & Anti-Fly Missile

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson

Ensemble methods in machine learning. Example. Neural networks. Neural networks

Generalized Additive Model

3 Nonlinear Regression

Chapter 1 Introduction. Chapter Contents

MATHEMATICS II: COLLECTION OF EXERCISES AND PROBLEMS

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects

Polymath 6. Overview

Algebra 2 Semester 1 (#2221)

EPSRC Centre for Doctoral Training in Industrially Focused Mathematical Modelling

Using the CLP Procedure to solve the agent-district assignment problem

Concept of Curve Fitting Difference with Interpolation

IDENTIFYING OPTICAL TRAP

5 Day 5: Maxima and minima for n variables.

Unit 2: Day 1: Linear and Quadratic Functions

Discrete Optimization. Lecture Notes 2

Stat 5100 Handout #11.a SAS: Variations on Ordinary Least Squares

Curve fitting using linear models

A *69>H>N6 #DJGC6A DG C<>C::G>C<,8>:C8:H /DA 'D 2:6G, ()-"&"3 -"(' ( +-" " " % '.+ % ' -0(+$,

x, with a range of 1-50 (You can use the randbetween command on excel)

EE368 Project Report CD Cover Recognition Using Modified SIFT Algorithm

Modern Methods of Data Analysis - WS 07/08

FITTING PIECEWISE LINEAR FUNCTIONS USING PARTICLE SWARM OPTIMIZATION

Civil Engineering Systems Analysis Lecture XIV. Instructor: Prof. Naveen Eluru Department of Civil Engineering and Applied Mechanics

Both the polynomial must meet and give same value at t=4 and should look like this

CS6375: Machine Learning Gautam Kunapuli. Mid-Term Review

Classroom Tips and Techniques: Least-Squares Fits. Robert J. Lopez Emeritus Professor of Mathematics and Maple Fellow Maplesoft

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

Nonparametric Approaches to Regression

L-Bit to M-Bit Code Mapping To Avoid Long Consecutive Zeros in NRZ with Synchronization

The PK Package. January 26, Author Martin J. Wolfsegger and Thomas Jaki

CS 450 Numerical Analysis. Chapter 7: Interpolation

Optimization of Design. Lecturer:Dung-An Wang Lecture 8

Regression. Dr. G. Bharadwaja Kumar VIT Chennai

HYBRID GENETIC ALGORITHM WITH GREAT DELUGE TO SOLVE CONSTRAINED OPTIMIZATION PROBLEMS

Outlines. Medical Image Processing Using Transforms. 4. Transform in image space

Lecture 7: Most Common Edge Detectors

Chapter 3 Numerical Methods

Discovery of the Source of Contaminant Release

Moving Beyond Linearity

Fundamentals of Operations Research. Prof. G. Srinivasan. Department of Management Studies. Indian Institute of Technology Madras.

Linear programming II João Carlos Lourenço

Globally Stabilized 3L Curve Fitting

CSC 411: Lecture 02: Linear Regression

Simulation of Imputation Effects Under Different Assumptions. Danny Rithy

186 Statistics, Data Analysis and Modeling. Proceedings of MWSUG '95

Parameter optimization model in electrical discharge machining process *

Civil Engineering Systems Analysis Lecture XV. Instructor: Prof. Naveen Eluru Department of Civil Engineering and Applied Mechanics

2014 Stat-Ease, Inc. All Rights Reserved.

Online Algorithm Comparison points

Constructing Hidden Units using Examples and Queries

Introduction to Machine Learning CMU-10701

Generalized Network Flow Programming

Spatial Interpolation & Geostatistics

CS 188: Artificial Intelligence

Today is the last day to register for CU Succeed account AND claim your account. Tuesday is the last day to register for my class

STAT 705 Introduction to generalized additive models

Applied Regression Modeling: A Business Approach

HW 10 STAT 672, Summer 2018

Functions and Graphs. The METRIC Project, Imperial College. Imperial College of Science Technology and Medicine, 1996.

Excerpt from "Art of Problem Solving Volume 1: the Basics" 2014 AoPS Inc.

Linear and quadratic Taylor polynomials for functions of several variables.

MEI GeoGebra Tasks for AS Pure

Constrained and Unconstrained Optimization

Practical Guidance for Machine Learning Applications

Lecture 24: Generalized Additive Models Stat 704: Data Analysis I, Fall 2010

Today. Golden section, discussion of error Newton s method. Newton s method, steepest descent, conjugate gradient

Want to Do a Better Job? - Select Appropriate Statistical Analysis in Healthcare Research

Machine Learning. Decision Trees. Manfred Huber

Cover Page. The handle holds various files of this Leiden University dissertation.

Spatial Interpolation - Geostatistics 4/3/2018

Chapter 1 Introduction to Optimization

ROBUST LINE-BASED CALIBRATION OF LENS DISTORTION FROM A SINGLE VIEW

APPLICATION OF FUZZY REGRESSION METHODOLOGY IN AGRICULTURE USING SAS

PORTFOLIO OPTIMISATION

For Monday. Read chapter 18, sections Homework:

Announcements. CS 188: Artificial Intelligence Fall Reminder: CSPs. Today. Example: 3-SAT. Example: Boolean Satisfiability.

CS 188: Artificial Intelligence Fall 2008

Quadratic Equations Group Acitivity 3 Business Project Week #5

HW 10 STAT 472, Spring 2018

Solution Methods Numerical Algorithms

Solving for dynamic user equilibrium

Mapping of Hierarchical Activation in the Visual Cortex Suman Chakravartula, Denise Jones, Guillaume Leseur CS229 Final Project Report. Autumn 2008.

Optimization and least squares. Prof. Noah Snavely CS1114

Stephen Scott.

Getting Correct Results from PROC REG

Chapter 1 Section 1 Lesson: Solving Linear Equations

Transcription:

NESUG 7 The Piecewise Regression Model as a Response Modeling Tool Eugene Brusilovskiy University of Pennsylvania Philadelphia, PA Abstract The general problem in response modeling is to identify a response curve and estimate the diminishing returns effect. There are d ifferent approaches to response modeling in SAS with emphasis on caveats (OLS segmented regression, robust regression, neural nets, and nonparametric regression). In this paper, we formulate a new problem statement of response modeling as a concave piecewise approximation of a response curve. We use artificial data to illustrate this approach. As the accuracy of the solution depends on the ratio of signal/(signal+noise), we can obtain the exact solution when the ratio is close to one, or when the knots are known. A three step heuristic approach is suggested for the situation when the data are significantly contaminated and/or the knots are unknown. The approach includes three steps. First, we run a dummy regression with the SAS PROC REG to estimate the parameters of dummy variables. Then, we test for a structural break in the series of estimated parameters from the dummy regression using the Chow test in PROC AUTOREG or PROC MODEL. If a change point is identified, then it is treated as a knot or as a first approximation of a knot coordinates in piecewise regression. Finally, we develop a piecewise concave regression with PROC NLP. Introduction Promotion response modeling (PRM) is a necessary decision support tool in today s highly-competitive market. Since the consequences (cost, for instance) of wrong decisions are increasing, so is the role of the promotion response modeling. PRM is industry-specific: for example, response modeling in the credit card industry is essentially different from that in the pharmaceutical industry. In this paper, we will concentrate on the latter, where a response curve is used for the evaluation of effectiveness of the promotion campaign, sales force allocation, development of the optimal marketing mix, etc. Each of these problems may require its own definition of the response curve. In general, however, a promotion response curve is the result of PRM and could be defined as a mathematical construct (depending on the nature of its application) that relates a promotional effort to its response. In the pharmaceutical industry, the response could be the number of new prescriptions, and the promotion effort could be doctor detailing, controlling for all other promotion efforts. As stated above, an adequate definition of the response curve is very important. Real promotion campaign data are very noisy, and the relationship between promotional efforts and responses to them are very weak. Moreover, it is necessary to take into account the diminishing returns and the monotonically increasing nature of the response curve. In other words, we assume that the higher the promotion effort, the higher the response, until some point where the over-promotion effect may kick in. Nonparametric regression would not be helpful here because the resulting response curve will, as a rule, have multiple maxima and minima. Therefore, we formulate the problem of response modeling as a problem of nonlinear optimization with linear and nonlinear constraints. To be specific, we have to find a concave piecewise linear approximation of the relationship between the response and promotion efforts. Concavity and monotonicity are necessary to reflect the diminishing returns in the resulting response curve. The piecewise linearity condition has to hold for the sake of simplicity of the next steps of the decision support, that requires optimization. Problem Statement To make things easier, we will consider only three pieces in the concave piecewise linear approximation of the response curve (see Graph ). Many authors have considered the problem of piecewise linear approximation (for example, see (), We know that a product is over-promoted when the marginal response becomes negative.

NESUG 7 (3)). In this paper, we impose the concavity restriction that is not present in other works; moreover, and we want to solve this problem in SAS. Let Y be the response, X be the promotion effort, and S and S be the first and second knots, respectively. Then, based on promotional data, we need to find the set of unknown parameters B 0, B, B, B 3, S, S that minimize the objective function (the sum of squared residuals): () Y Y Y = B = B = B 0 0 0 + B + B S + B S X, + B ( X + B ( S S ), S ) + B 3 ( X S ), X S S < X S X > S Where B > 0, B > 0, B B > 0, B B 3 > 0, S > 0, and S S > 0. This formulation assumes that B 3 could be negative. If this is the case, the response goes down as promotion efforts go up this is the over-promotion effect we mentioned above. If we believe that there is no over-promotion effect, then we have to impose an additional restriction on B 3 to be nonnegative. According to the definition of the response curve Y, it is continuous, but not differentiable at the knots. Mathematically, the problem of finding the response curve Y, formulated above, is that of nonlinear programming with a continuous but non-differentiable objective function and linear and nonlinear constraints. This type of problem could be solved by the SAS/OR Non-Linear Programming procedure (PROC NLP). PROC NLP has different optimization algorithms. However, we can use only one of them, namely Nelder-Mead Simplex Method (NMSIMP). It is the only algorithm that doesn t require first and second order derivatives, and allows for boundary constraints, linear constraints and nonlinear constraints. (p. 49, ()) Caveats and the Solution The formulation of the problem of finding the concave piecewise linear response curve seems very simple, however, this is just an illusion. Even when the data don t contain any noise, the estimation of the concave nonlinear piecewise response curve does not always lead to a precise solution. Let s consider the following example, where we assume that our response curve consists of three linear pieces. () Y Y Y = = = A0 + A X, A0 + AT + A ( X T ), A + A T + A ( T T ) + A ( X 0 3 T ), X T T < X X > T T Where A 0 =0, A =4, A =, A 3 = -, T =8, T =6. Substituting, we get: (3) Y Y Y = 4X, = 4 8 + ( X 8) = 4 + X, = 4 8 + (6 8) ( X 6) = 56 X, X 8 8 < X 6 X > 6 Our goal is to use PROC NLP (see code below) to find estimates B 0, B, B, B 3, S, S of parameters A 0, A, A, A 3, T, T based only on the data for X and Y generated by () using the program in Appendix. As can be seen from the code, we consider the situation without the over-promotion effect: A 3 =0.>0. First, the PROC NLP code did not include any initial values for parameters. In this situation, PROC NLP automatically assigns random initial values, and the results of the parameter estimation vary according to those initial values. If these randomly assigned initial values are not close to the actual values, PROC NLP is not able to find the exact solution. The code below

NESUG 7 PROC NLP data=data_no_noise OUTEST=STATS TECH=NMSIMP MAXFUNC=500000; PARAMETERS B0, B, B, B3, S, S; * PARAMETERS B0, B, B, B3, S=5, S=9; MIN F; IF X <= S THEN DO; YY=B0 + B*X; F=(YY-Y)**; ELSE IF X <= S THEN DO; YY=B0 + B*S + B*(X-S); F=(YY-Y)**; ELSE DO; YY=B0 + B*S + B*(S-S) + B3*(X - S); F=(YY-Y)**; LINCON S > 3, S > S+3, B > 0, B > 0, B - B > 0, B- B3 > 0; NLINCON B3>=0; /****NO OVERPROMOTION EFFECT****/ produced very different results every time it was run (out of ten times). The best solution was: PROC NLP: Nonlinear Minimization Optimization Results Parameter Estimates Gradient Objective N Parameter Estimate Function B0-0.6976 0.65593 B 3 B 4.076 3.56570 0.36660 0.306 4 B3 0.604655 0.3573 5 S 6 S 4.95498 9.88096-0.8738 0.05846 Value of Objective Function = 40.468434 where the initial values B 0 = 0.988009007, B = 0.935467, B = 0.4936586565, B 3 = 0.394099066, S = 0.8340534 and S = 0.487640093. When we put plausible initial values for knots S = 5 and S = 9, proc NLP immediately found almost the exact solution: PROC NLP: Nonlinear Minimization Optimization Results Parameter Estimates Gradient Objective N Parameter Estimate Function B0-0.04408-0.0676 B 3 B 4.009843.5494-0.48904-0.05357 4 B3 0.0570-0.05936 5 S 6 S 7.97679 5.96853-0.36654-0.30566 Value of Objective Function = 0.047445 Since the objective function possesses potentially many local minima, and is non-differentiable at a number of points, the strong dependence of the accuracy of the solution on initial values is the general problem of nonlinear optimization. The situation becomes more complex when PROC NLP tries to estimate a response curve without over-promotion effect from the data with over-promotion effect. Even when there is no noise in the data, this is still a very complicated problem.

NESUG 7 In real response data, the Signal Signal + Noise ratio is very small, and thus the problem frequently becomes intractable. Noisy data (see Graph ) were generated by the code in Appendix, with the exact same parameters A 0 =0, A =4, A =.5, A 3 =0., S =8, S =6 as in our example above. Since real parameters are unknown, it s very difficult to evaluate the results, but the dependence of the solution on initial values for noisy data is stronger. Thus, to overcome this problem, we offer a three-step heuristic approach. In the first step, we use dummy regression (PROC REG), in order to estimate the parameters of dummy variables for X= to X=max[X]. These regression parameters are treated as a time series, where instead of the time index, the number of the dummy variable is used. (The code for the dummy regression is in Appendix 3, and the graph of dummy regression parameter estimates as a series is in Graph 3). Secondly, if we have some sort of expert knowledge about the number of the knots and their locations, we can apply the Chow Test (PROC AUTOREG) to test the hypothesis about the breakpoints, i.e., knots. The last step involves estimating the parameters of the piecewise concave response curve from the series data using PROC NLP. Here, we can set the breakpoints from the second step as the initial values. Although the problem of assigning the initial values remains, the optimization problem becomes significantly simpler. In our example, we don t know the optimal number of segments it could be one, two or three. Thus, we need to run PROC NLP three times and then compare the values of the objective function in all three cases with zero, one and two knots. The final response curve will consist of the number of segments that has the smallest objective function. Comparing values of the objective function, the optimal number of segments in our example is. Summary We formulate a new problem statement of response modeling as a concave piecewise approximation of a response curve. We use artificial data to illustrate this approach. As the accuracy of the solution depends on the ratio of Signal Signal + Noise we can obtain the exact solution when the ratio is close to one, or when the knots are known. A three step heuristic approach is suggested for the situation when the data are significantly contaminated and/or the knots are unknown., References. Chapter 5. The NLP Procedure. SAS Institute Inc., SAS/OR User s Guide: Mathematical Programming, Version 8, Cary, NC: SAS Institute, Inc., 999, pp. 369-5.. Lerman, P.M. Fitting Segmented Regression Models by Grid Search. Applied Statistics, Vol. 9, No. (980), pp. 77-84. 3. McGee, Victor E and Willard T. Carleton. Piecewise Regression. Journal of the American Statistical Association, Vol. 65, No. 33 (Sep., 970), 09-4.

NESUG 7 APPENDIX Macro for generation of piecewise linear response curve. %MACRO DDATA_NO_NOISE(A0=,A=,A=,A3=, S=,S=); DATA DATA_NO_NOISE; %DO XX= %TO 5 ; /****ONLY ONE PIECE***/ %IF &S=0 %THEN %DO; Y=%SYSEVALF(&A0 + %SYSEVALF(&A*&XX)); /***TWO PIECES***/ %ELSE %IF &S=0 %THEN %DO; %IF &XX<=&S %THEN %DO; Y=%SYSEVALF(&A0 + %SYSEVALF(&A*&XX)); %ELSE %DO; Y=%SYSEVALF(&A0 + %SYSEVALF(&A*&S) + %SYSEVALF(&A*(&XX-&S))); /****THREE PIECES***/ %ELSE %IF &XX<=&S %THEN %DO; Y=%SYSEVALF(&A0 + %SYSEVALF(&A*&XX)); %ELSE %IF &XX <=&S %THEN %DO; Y=%SYSEVALF(&A0 + %SYSEVALF(&A*&S) + %SYSEVALF(&A*(&XX- &S))); %ELSE %DO; Y=%SYSEVALF(&A0 + %SYSEVALF(&A*&S) + %SYSEVALF(&A*(&S- &S)) + %SYSEVALF(&A3*(&XX-&S))); PROC PRINT DATA=DATA_NO_NOISE; PROC GPLOT DATA=DATA_NO_NOISE; PLOT Y*X; %MEND DDATA_NO_NOISE; %DDATA_NO_NOISE(A0=0,A=4,A=.5,A3=0., S=8,S=6)

NESUG 7 APPENDIX Macro for generation of data with normal noise. %MACRO DDATA_WITH_NORNOISE(A0=,A=,A=,A3=, S=,S=); DATA DATA_WITH_NORNOISE; %DO XX= %TO 5 ; /****ONLY ONE PIECE***/ %IF &S=0 %THEN %DO; DO I= TO 5; YY=%SYSEVALF(&A0 + %SYSEVALF(&A*&XX)); Y=YY + SQRT(5)*RANNOR(345+I*4); /****TWO PIECES***/ %ELSE %IF &S=0 %THEN %DO; DO I= TO 5; %IF &XX<=&S %THEN %DO; YY=%SYSEVALF(&A0 + %SYSEVALF(&A*&XX)); Y=YY + SQRT(5)*RANNOR(345+I*4); %ELSE %DO; YY=%SYSEVALF(&A0 + %SYSEVALF(&A*&S) + %SYSEVALF(&A*(&XX-&S))); Y=YY + SQRT(5)*RANNOR(345+I*00); /****THREE PIECES***/ %ELSE %IF &XX<=&S %THEN %DO; YY=%SYSEVALF(&A0 + %SYSEVALF(&A*&XX)); DO I= TO 5; Y=YY +.8*SQRT(5)*RANNOR(345+I*4); %ELSE %IF &XX <=&S %THEN %DO; YY=%SYSEVALF(&A0 + %SYSEVALF(&A*&S) + %SYSEVALF(&A*(&XX-&S))); DO I= TO 50; Y=YY + 4*SQRT(5)*RANNOR(345+I*00)+ *SQRT(5)*RANNOR(345+I*); %ELSE %DO; YY=%SYSEVALF(&A0 + %SYSEVALF(&A*&S) + %SYSEVALF(&A*(&S-&S)) + %SYSEVALF(&A3*(&XX-&S))); DO I= TO 5; Y=YY + 3*SQRT(5)*RANNOR(345+I*5);

NESUG 7 DATA DATA_WITH_NORNOISE; SET DATA_WITH_NORNOISE (WHERE=(Y >=0)); DROP YY I; PROC PRINT DATA=DATA_WITH_NORNOISE; PROC GPLOT DATA=DATA_WITH_NORNOISE; PLOT Y*X; %MEND DDATA_WITH_NORNOISE; %DDATA_WITH_NORNOISE(A0=0,A=4,A=.5,A3=0., S=8, S=6)

NESUG 7 APPENDIX 3 Dummy Regression. DATA DATA_FOR_DUMMY_REG; /*****CREATION DUMMY VARIABLE FRO EACH VALUE OF PROMOTION EFFORT*****/ ARRAY DUMMY(5) ; SET DATA_WITH_NORNOISE; DO I= TO 5; * IF X = I THEN DUMMY(I)=; * ELSE DUMMY(I)=0; DUMMY(I)=(X=I); DROP I; PROC PRINT DATA=DDD (OBS=50); ODS OUTPUT PARAMETERESTIMATES=PPP; PROC REG DATA=DATA_FOR_DUMMY_REG; /*****DUMMY REGRESSION****/ MODEL Y=DUMMY-DUMMY4; QUIT; DATA PPP; SET PPP; KEEP Variable Estimate ; PROC PRINT DATA=PPP; PROC TRANSPOSE DATA=PPP OUT=TTT PREFIX=COEFF; /***ONE COLUMN PER COEFFICIENT****/ PROC PRINT DATA=TTT; /***COEF=INTERCEPT***/ DATA COEFS; /****COEFFICIENTS ADJUSTMENT****/ ARRAY PARMS(4) PARMS-PARMS4; ARRAY COEFF(5) COEFF-COEFF5; SET TTT; PARMS0=COEFF; DO I= TO 4; PARMS(I)=PARMS0+COEFF(I+); DROP COEFF-COEFF5 I _NAME LABEL_ PARMS0; PROC PRINT DATA=COEFS; PROC TRANSPOSE DATA=COEFS OUT=SERIES; /****COEFFICIENTS AS "TIME SERIES"****/ PROC PRINT DATA=SERIES; DATA SERIES; SET SERIES (RENAME=(COL=Y)); X+; DROP _NAME_; PROC PRINT DATA=SERIES; PROC GPLOT DATA=SERIES; PLOT Y*X;

NESUG 7 Graph : The Three-piece response curve The x-axis is the promotion efforts and the y-axis is the response. Graph - Simulated data similar to the real world data The x-axis is the promotion efforts and the y-axis is the response

NESUG 7 Graph 3 - Series of dummy regression parameter estimates The x-axis is the promotion efforts and the y-axis is the response