piecewise ginireg 1 Piecewise Gini Regressions in Stata Jan Ditzen 1 Shlomo Yitzhaki 2 September 8, 2017

Similar documents
Bivariate (Simple) Regression Analysis

Week 4: Simple Linear Regression II

Lab 2: OLS regression

Week 10: Heteroskedasticity II

Week 4: Simple Linear Regression III

Week 11: Interpretation plus

schooling.log 7/5/2006

Two-Stage Least Squares

Week 9: Modeling II. Marcelo Coca Perraillon. Health Services Research Methods I HSMP University of Colorado Anschutz Medical Campus

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE

Panel Data 4: Fixed Effects vs Random Effects Models

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%

Stata Session 2. Tarjei Havnes. University of Oslo. Statistics Norway. ECON 4136, UiO, 2012

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Week 5: Multiple Linear Regression II

PubHlth 640 Intermediate Biostatistics Unit 2 - Regression and Correlation. Simple Linear Regression Software: Stata v 10.1

Labor Economics with STATA. Estimating the Human Capital Model Using Artificial Data

Instrumental variables, bootstrapping, and generalized linear models

Stata Training. AGRODEP Technical Note 08. April Manuel Barron and Pia Basurto

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Missing Data Analysis for the Employee Dataset

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Introduction to Stata: An In-class Tutorial

SLStats.notebook. January 12, Statistics:

GLM II. Basic Modeling Strategy CAS Ratemaking and Product Management Seminar by Paul Bailey. March 10, 2015

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

xsmle - A Command to Estimate Spatial Panel Data Models in Stata

ZunZun.com. User-Selectable Polynomial. Sat Jan 14 09:49: local server time

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

An Introductory Guide to Stata

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

/23/2004 TA : Jiyoon Kim. Recitation Note 1

Subset Selection in Multiple Regression

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian. Panel Data Analysis: Fixed Effects Models

DS Machine Learning and Data Mining I. Alina Oprea Associate Professor, CCIS Northeastern University

Introduction to STATA 6.0 ECONOMICS 626

Linear regression Number of obs = 6,866 F(16, 326) = Prob > F = R-squared = Root MSE =

Robust Linear Regression (Passing- Bablok Median-Slope)

1.1 Pearson Modeling and Equation Solving

Stat 5100 Handout #6 SAS: Linear Regression Remedial Measures

Review of Stata II AERC Training Workshop Nairobi, May 2002

Multivariate Analysis Multivariate Calibration part 2

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Introduction to Programming in Stata

May 24, Emil Coman 1 Yinghui Duan 2 Daren Anderson 3

PSY 9556B (Feb 5) Latent Growth Modeling

Study Guide. Module 1. Key Terms

Multiple-imputation analysis using Stata s mi command

ECON Introductory Econometrics Seminar 4

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Multiple Regression White paper

STA 570 Spring Lecture 5 Tuesday, Feb 1

Computing Murphy Topel-corrected variances in a heckprobit model with endogeneity

Predictive Analysis: Evaluation and Experimentation. Heejun Kim

Source:

Regression Analysis and Linear Regression Models

25 Working with categorical data and factor variables

Statistical Analysis of Metabolomics Data. Xiuxia Du Department of Bioinformatics & Genomics University of North Carolina at Charlotte

Assignment No: 2. Assessment as per Schedule. Specifications Readability Assignments

Robust Regression. Robust Data Mining Techniques By Boonyakorn Jantaranuson

The Piecewise Regression Model as a Response Modeling Tool

Linear Regression & Gradient Descent

Linear Functions. College Algebra

Box-Cox Transformation for Simple Linear Regression

Applied Regression Modeling: A Business Approach

Getting started with Stata 2017: Cheat-sheet

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

Applied Statistics and Econometrics Lecture 6

Health Disparities (HD): It s just about comparing two groups

Conditional and Unconditional Regression with No Measurement Error

An Introduction to Stata Part II: Data Analysis

IQR = number. summary: largest. = 2. Upper half: Q3 =

Graphical Analysis of Data using Microsoft Excel [2016 Version]

NCSS Statistical Software. Robust Regression

1 Downloading files and accessing SAS. 2 Sorting, scatterplots, correlation and regression

Nina Zumel and John Mount Win-Vector LLC

Missing Data Analysis for the Employee Dataset

MS&E 226: Small Data

MDM 4UI: Unit 8 Day 2: Regression and Correlation

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Learner Expectations UNIT 1: GRAPICAL AND NUMERIC REPRESENTATIONS OF DATA. Sept. Fathom Lab: Distributions and Best Methods of Display

CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

UNIT 1: NUMBER LINES, INTERVALS, AND SETS

Bootstrapping Method for 14 June 2016 R. Russell Rhinehart. Bootstrapping

Part I, Chapters 4 & 5. Data Tables and Data Analysis Statistics and Figures

Instruction on JMP IN of Chapter 19

8. MINITAB COMMANDS WEEK-BY-WEEK

Machine Learning and Bioinformatics 機器學習與生物資訊學

SAS (Statistical Analysis Software/System)

Section 3.4: Diagnostics and Transformations. Jared S. Murray The University of Texas at Austin McCombs School of Business

Statistical Matching using Fractional Imputation

Cross-validation and the Bootstrap

Beta-Regression with SPSS Michael Smithson School of Psychology, The Australian National University

Chapter 9 Robust Regression Examples

Evaluation Measures. Sebastian Pölsterl. April 28, Computer Aided Medical Procedures Technische Universität München

Mathematics. Algebra, Functions, and Data Analysis Curriculum Guide. Revised 2010

Transcription:

piecewise ginireg 1 Piecewise Gini Regressions in Stata Jan Ditzen 1 Shlomo Yitzhaki 2 1 Heriot-Watt University, Edinburgh, UK Center for Energy Economics Research and Policy (CEERP) 2 The Hebrew University and Hadassah Academic College, Jerusalem, Israel 1 Name subject to changes... September 8, 2017 Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 1 / 24

Note This slide was added after the presentation at the Stata User Group Meeting in London. As of 11. September 2017 picewise ginireg is not available on SSC or publicly otherwise. For inquiries, questions or comments, please write me at j.ditzen@hw.ac.uk or see www.jan.ditzen.net Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 2 / 24

Introduction OLS requires... 1... linear relationship between conditional expectation of the dependent variable and explanatory variables and... 2... errors are iid and uncorrelated with the independent variables. Often monotonic transformations are applied to linearize the model, can lead to changes of the sign of the estimated coefficients. OLS sensitive to outliers. Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 3 / 24

Gini Regressions Basics Idea: replace the (co-)variance in an OLS regression with the Gini notion of (co-)variance, i.e. the Gini s Mean Difference (GMD) as the measure of dispersion. Gini Mean Difference: G YX = E Y X with gini covariance: Gcov(Y, X ) = cov (Y, F (X )), where F (X ) is the cumulative population distribution function. Regressor β G = cov(y,f (X )) cov(x,f (X )). Can be interpreted as an IV regression, with F (X ) as an instrument for X. Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 4 / 24

Gini Regressions Advantages of Gini Regressions Gini regressions do not rely on Symmetric correlation and variability measure Linearity of the model. Coefficients do not change after monotonic transformations of the explanatory or independent variables. GMD here definition has two asymmetric correlation coefficients, one can be used for the regression, the other can be used to test the linearity assumption. Summarized in Yitzhaki and Schechtman (2013); Yitzhaki (2015). Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 5 / 24

Example mroz.dta Dataset Estimate log wage using education. wage Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 6 / 24

Estimation in Stata ginireg (Schaffer, 2015) Package to estimate gini regressions. Allows for extended and mixed Gini regressions and IV regressions. Post estimation commands allow prediction of residuals and fitted values, and calculation of LMA curve. Includes ginilma to graph Gini LMA and NLMA curves. Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 7 / 24

Example. use http://fmwww.bc.edu/ec-p/data/wooldridge/mroz.dta, clear. reg lwage educ Source SS df MS Number of obs = 428 F(1, 426) = 56.93 Model 26.3264237 1 26.3264237 Prob > F = 0.0000 Residual 197.001028 426.462443727 R-squared = 0.1179 Adj R-squared = 0.1158 Total 223.327451 427.523015108 Root MSE =.68003 lwage Coef. Std. Err. t P> t [95% Conf. Interval] educ.1086487.0143998 7.55 0.000.0803451.1369523 _cons -.1851969.1852259-1.00 0.318 -.5492674.1788735. ginireg lwage educ Gini regression Number of obs = 428 GR = 0.321 Gamma YYhat = 0.319 Gamma YhatY = 0.450 lwage Coef. Std. Err. z P> z [95% Conf. Interval] educ.105074.0150097 7.00 0.000.0756556.1344924 _cons -.1399459.1928283-0.73 0.468 -.5178824.2379906 Gini regressors: educ Least squares regressors: _cons One additional year of education increases the hourly wage by 10.9% (OLS) and by 10.5% (gini). Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 8 / 24

Gini Regressions Line of independence minus absolute concentration curve (LMA) LMA defined as LOI ACC: Line of Independence (LOI) is a straight line from (0, 0) to (µy, 1), represents statistical independence between X and Y. LOI (p) = µ y p. Absolute concentration curve ACC(p) = x p g(t)df (t), where g(x) represents the regression curve. Properties: Starts at (0, 0) and ends at (1, 0). If it is above (below) the horizontal axis, section contributes positive (negative) to the regression coefficient. If intersects the horizontal axis, then the sign of an OLS regression coefficient can change if there is a monotonic increasing transformation of X. If curve is concave (convex, straight line), then the local regression coefficient is decreasing (increasing, constant). The LMA allows an interpretation of how the Gini covariance is composed and thus how the coefficients are effected as it includes the Gcov(Y, X ). Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 9 / 24

Example cov(e, F (x)) = 0 by construction, thus in the optimal case LMA fluctuates randomly around 0. Section A has a negative contribution to β, Section B has a postive contribution to β, or differently: a monotonic transformation that changes the sign of the OLS coefficient. This is not reflected by ginireg (or reg). Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 10 / 24

piecewise ginireg Introduction Aim: Steps Estimate regression which splits the data into sections determined by the LMA. Split the data until normality conditions of the error terms hold or the sections are small. 1 Run Gini regression using the entire data. 2 Calculate residuals and LMA to determine sections. 3 Check if assumption for normality in the errors within the sections holds, or sections are small enough. If it does, stop; if not, continue. 4 Run a gini regression on each of the sections with the errors as a dependent variable and repeat steps 2-4. Iteration: Step 2-4. Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 11 / 24

piecewise ginireg syntax Syntax piecewise ginireg depvar indepvars [ if ], maxiterations(integer) stoppingrule [ minsample(integer) restrict(varlist values) turningpoint(options) ginireg(string) nocontinuous showqui noconstant showiterations drawlma drawreg addconstant bootstrap(string) bootshow multipleregressions(options) ] where either maxiterations(integer) or stoppingrule have to be used. Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 12 / 24

piecewise ginireg options stoppingrule and bootstrap() When to stop? with If X and Y are exchangeable random variables, then the gini correlation of Y and X (C(Y, X )) and X and Y (C(X, Y )) are equal. Schröder and Yitzhaki (2016) suggest to split the dataset into two subsamples and test the gini correlations for equality: H 0 : C(Y, X ) = C(X, Y ) H A : C(Y, X ) C(X, Y ) C(Y, X ) = cov (Y, F (X )) cov (Y, F (Y )) Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 13 / 24

piecewise ginireg options stoppingrule and bootstrap() If option stoppingrule used, standard errors for gini correlation required. The difference between the two gini correlations, D = C (X, Y ) C (Y, X ), is bootstrapped and then tested with: H 0 : D = 0 vs. H A : D 0. Option bootstrap(p(level) R(#)) sets the p-value and number of replications. Option minsample(#) Alternative rule: minimal size of a section. Default: N/10 Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 14 / 24

piecewise ginireg Example. piecewise_ginireg lwage educ, addconstant stoppingrule Piecewise Linear Gini Regression. Dependent Variable: lwage Number of obs = 428 Independent Variables: educ _cons Number of groups = 2 Groupvariables: educ Iterations = 1 Final Results (sum of coefficients) GR = 1.658 Gamma YYhat = 0.321 Gamma YhatY = 0.445 Coef. Std. Err. z P> z [95% Conf. Interval] Final Group Estimates for 5 <= educ <= 13 (N=311) in group 1 educ.086041.047903 1.80 0.072 -.007846.1799286 Final Group Estimates for 14 <= educ <= 17 (N=117) in group 2 educ.256339.277607 0.92 0.356 -.2877608.800438 Sections determined by LMA crossing line of origin (LMA(p) = 0). Bootstrap performed with 50 replications. p-value for test of difference:.1 Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 15 / 24

piecewise ginireg Example, including iterations. piecewise_ginireg lwage educ, addconstant stoppingrule showiterations Piecewise Linear Gini Regression. Dependent Variable: lwage Number of obs = 428 Independent Variables: educ _cons Number of groups = 2 Groupvariables: educ Iterations = 1 Iteration: 0, with 1 groups GR = 1.658 Gamma YYhat = 0.321 Gamma YhatY = 0.445 Coef. Std. Err. z P> z [95% Conf. Interval] Estimates for 5 <= educ <= 17 (N=428) educ.105074.01501 7.00 0.000.0756556.1344924 _cons -.139946.192828-0.73 0.468 -.5178824.2379906 Final Results (sum of coefficients) Coef. Std. Err. z P> z [95% Conf. Interval] Final Group Estimates for 5 <= educ <= 13 (N=311) in group 1 educ.086041.047903 1.80 0.072 -.007846.1799286 Final Group Estimates for 14 <= educ <= 17 (N=117) in group 2 educ.256339.277607 0.92 0.356 -.2877608.800438 Sections determined by LMA crossing line of origin (LMA(p) = 0). Bootstrap performed with 50 replications. p-value for test of difference:.1 Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 16 / 24

drawlma back. qui piecewise_ginireg lwage educ, addconstant stoppingrule drawlma. estat savegraphs, as(png) path("...") Graph graph_0_educ saved as.../graph_0_educ.png Graph graph_1_educ saved as.../graph_1_educ.png Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 17 / 24

piecewise ginireg I Options maxiterations(integer): number of maximum of iterations turningpoint(zero maxmin) specifies the turning point. Default is turningpoint(zero) and the sections are defined by intersections of the LMA with the origin. Alternative is turningpoint(minmax) or turningpoint(maxmin). Then sections are defined by maxima and minima of the LMA curve. restrict(varlist values): specifies group variables and values for sections. For example if the group variable is age and ranges from 10 to 20, 2 sections are wanted, from 10 to 15 and 16 to 20, then restrict(age 15) is used. Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 18 / 24

piecewise ginireg II Options nocontinuous no continuous piecewise regression. The constant is included and estimated in all estimations for sections > 2. If not specified, the constant is the predicted value of the last observation in the previous section. It is only included in regression of the first section. All regressions for the following sections are run without a constant. noconstant: suppresses the constant in the first initial regression and in the 1st section of the following iterations. addconstant: adds a constant for the section regressions in iterations > 1. showiterations displays in the output the regression results from all iterations. If not specified only the accumulated results are shown. Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 19 / 24

piecewise ginireg Further options and work in progress Implemented drawlma and drawreg Example Saves line graph of LMA and scatter plot of fitted values and independent variable for later use. Can be saved with estat. Postestimation predict: calculation of linear prediction, LMA, residuals and coefficients. Work in progress multipleregressions Allows for more than one independent variable. order(varlist) controls specifies order of variables for determining the sections. groups: first the number of sections for each variable is calculated until convergence is achieved. Then the variables are ordered in as- or descending order of groups. Statistics such as Gini godness of fit Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 20 / 24

Conclusion piecewise ginireg... Extends ginireg Determines sections using the LMA. Estimates coefficients for each section. Several criteria for optimal number of sections possible. Alternative names: pwginireg pginireg...any other? Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 21 / 24

Definitions See Olkin and Yitzhaki (1992) Gini Mean Difference (GMD) of X and Y : G XY = E X Y G X = 4cov (X, F X (X )) Gini Covariance: Gcov(Y, X ) = cov (Y, F X (X )), with F X population cumulative distribution function. Gini Correlation: C(X, Y ) = Gcov(Y,X ) Gcov(Y,Y ) = Cov(Y,F X (X )) Cov(Y,F Y (Y )) Properties of C(X, Y ): If X and Y are exchangeable random variables, then C(X, Y ) = C(Y, X ). If (X, Y ) has a bivariate normal distribution with means µ x, µ y and variances σx, 2 σy 2 and correlation ρ then C(X, Y ) = C(Y, X ) = ρ If X and Y are random variables, then G X +Y = C(X, X + Y )G x + C(Y, X + Y )G Y. If sample estimator of Gini covariance and the correlations are U-Statistics and asymptotically normal. Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 22 / 24

Example back mroz.dta Dataset Estimate wage using education. Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 23 / 24

References I Olkin, I. and S. Yitzhaki (1992): Gini Regression Analysis, International Statistical Review/Revue Internationale de Statistique, 60, 185 196. Schaffer, M. E. (2015): ginireg: Program to estimate Gini regression.. Schröder, C. and S. Yitzhaki (2016): Reasonable sample sizes for convergence to normality, Communications in Statistics - Simulation and Computation, 0918, 1 14. Yitzhaki, S. (2015): Gini s mean difference offers a response to Leamer s critique, Metron, 73, 31 43. Yitzhaki, S. and E. Schechtman (2013): The Gini Methodology, vol. 272. Jan Ditzen (Heriot-Watt University) piecewise ginireg 8. September 2017 24 / 24