PLS 802 Spring 2018 Professor Jacoby THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE This handout shows the log of a Stata session that performs least-squares regression analysis on some data about state Electoral College votes in the 1992 U.S. presidential election. The model is estimated two ways: First, using ordinary least squares; second, using OLS with robust standard errors. - (FIRST FEW LINES OMITTED) > * Retrieve the Clinton state voting data. use clint92; > * Describe the contents of the dataset, > * and calculate summary statistics. describe; Contains data from clint92.dta obs: 48 vars: 5 14 Apr 2009 10:09 size: 864 - storage display value variable name type format label variable label - state str2 %9s State name ideol float %9.0g Ideology of state electorate party float %9.0g Partisanship of state electorate black float %9.0g Pct African American vote92 float %9.0g Clinton won state Electoral College vote - Sorted by:. summarize; Variable Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- state 0 ideol 48-11.76667 10.0144-31.7 16.4 party 48 1.110417 14.43695-48.8 30.1 black 48 9.793813 9.341016.25 35.562 vote92 48.6458333.4833211 0 1 > * Use OLS to estimate the coefficients of > * a regression equation with clinton vote as the > * dependent variable, electorate ideology, electorate > * partisanship, and pct African American > * as independent variables
Page 2. regress vote92 ideol party black; Source SS df MS Number of obs = 48 -------------+---------------------------------- F(3, 44) = 14.76 Model 5.50624401 3 1.83541467 Prob > F = 0.0000 Residual 5.47292266 44.124384606 R-squared = 0.5015 -------------+---------------------------------- Adj R-squared = 0.4675 Total 10.9791667 47.233599291 Root MSE =.35268 vote92 Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- ideol.0232403.0055294 4.20 0.000.0120965.0343841 party.0162446.0038644 4.20 0.000.0084563.0240329 black -.0056811.0063082-0.90 0.373 -.0183944.0070322 _cons.9568955.087244 10.97 0.000.7810669 1.132724 > * Standardize independent variables and > * re-estimate model. egen ideol2 = std(ideol);. egen party2 = std(party);. egen black2 = std(black);. regress vote92 ideol2 party2 black2; Source SS df MS Number of obs = 48 -------------+---------------------------------- F(3, 44) = 14.76 Model 5.50624396 3 1.83541465 Prob > F = 0.0000 Residual 5.47292271 44.124384607 R-squared = 0.5015 -------------+---------------------------------- Adj R-squared = 0.4675 Total 10.9791667 47.233599291 Root MSE =.35268 vote92 Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- ideol2.2327376.0553736 4.20 0.000.1211394.3443358 party2.2345228.0557908 4.20 0.000.1220839.3469618 black2 -.0530673.0589248-0.90 0.373 -.1718224.0656877 _cons.6458333.0509053 12.69 0.000.5432405.7484262 > * Obtain predicted values, sort dataset, and print data. predict yhat, xb;. sort yhat;. list state ideol party black > vote92 yhat; +-----------------------------------------------------+ state ideol party black vote92 yhat 1. WY -18.6-48.8.881 0 -.2731164 2. UT -24.4-32.2.696 0 -.1371984 3. MS -31.7-4.5 35.562 0 -.0549541 4. ID -21.3-26.298 0.0378242 5. SC -26.7 -.7 29.825 0.1555694
Page 3 6. SD -19.4-16.1.431 0.2420469 7. VA -17.8-8.9 18.797 0.2918534 8. AL -23.5 2.9 25.266 0.3143192 9. AZ -13-13.3 3.029 0.4215102 10. NH -18.2-6.4.631 1.4263718 11. NE -10.5-14 3.612 0.4649276 12. NC -23 10.7 21.964 0.4714064 13. TX -20.4 5.5 11.903 0.5045167 14. OK -26.9 15.6 7.438 0.5428917 15. OR -12.7-5.4 1.619 1.5648251 16. FL -12.9.7 13.603 0.5911869 17. IN -13.1 -.8 7.792 0.5951848 18. VT -7.3-10.9.355 1.6081582 19. KS -13.6.6 5.771 0.6177886 20. MI -7-4.4 13.9 1.6437697 21. AR -27.7 26.7 15.908 1.6564957 22. CT -9.1-2.2 8.336 1.662313 23. NJ -5.8-4.4 13.415 1.6744134 24. MO -13.1 5.2 10.709 1.6760807 25. TN -18.1 14.2 15.952 1.6762948 26. ND -6.7-6.8.626 0.6871658 27. IL -8 1.3 14.819 1.7079028 28. WI -12.5 4.6 5.008 1.7126661 29. OH -9.7 3.1 10.648 1.7213306 30. LA -20.6 26.1 30.782 1.7272542 31. GA -15.7 18 26.968 1.731218 32. PA -10 4.2 9.174 1.7406015 33. MT -16 10.8.25 1.7590724 34. NV.7-9 6.572 1.7896259 35. WV -13.9 13 3.123 1.8272934 36. CO -5.7 1.9 4.038 1.8323503 37. MN -6.6 3.1 2.171 1.8415342 38. NM -5.8 2.9 1.98 1.8579626 39. CA -2.7 2.3 7.423 1.8893385 40. NY -2.2 5.3 15.892 1.9015792 41. WA -2.9 2.1 3.082 1.9061032 42. MD -6.3 18.5 24.89 1.9696044 43. IA -4.5 10 1.728 1 1.004943 44. KY -16.3 30.1 7.137 1 1.026496 45. DE 16.4-9.1 16.817 1 1.094671 46. ME 2.3 9.2.407 1 1.157486 47. MA 2.3 15.8 4.987 1 1.238682 48. RI 15.4 12.8 3.888 1 1.500639 +-----------------------------------------------------+ > * Graph dependent variable against fitted > * values, and add OLS line. Save graph > * to file, creating Figure 1. twoway (scatter vote92 yhat, > scheme(s1color) > jitter(3) > msymbol(oh) > mcolor(black) > ysize(4.5) > xsize(4.5) > xaxis(1 2) yaxis (1 2)
Page 4 > xtitle("0.957 + 0.016*Party + 0.023*Ideol -0.006*Black", axis(1)) > ytitle("clinton received state EC vote in 1992", axis(1)) > ylabel(, axis(1) nogrid) > ylabel(, axis(2) nolabel) > xlabel(, axis(2) nolabel) > legend(off) > ) > (lfit vote92 yhat) > ;. graph export "lpm fit.pdf", replace; (file lpm fit.pdf written in PDF format) > * Obtain residuals, construct > * residual plot, and save. > * This creates Figure 2.. predict esubi, residuals;. graph twoway > scatter esubi yhat, > ; (GRAPH OPTIONS OMITTED). graph export resid1.pdf, replace; (file resid1.pdf written in PDF format) > * Create graph of dependent variable against > * fitted values, with nonparametric regression > * line superimposed over points. Save graph > * to external file, creating Figure 3.. twoway (scatter vote92 yhat, > scheme(s1color) > jitter(3) > msymbol(oh) > mcolor(black) > ysize(4.5) > xsize(4.5) > xaxis(1 2) yaxis (1 2) > xtitle("0.957 + 0.016*Party + 0.023*Ideol -0.006*Black", axis(1)) > ytitle("clinton received state EC vote in 1992", axis(1)) > ylabel(, axis(1) nogrid) > ylabel(, axis(2) nolabel) > xlabel(, axis(2) nolabel) > legend(off) > ) > (lowess vote92 yhat) > ;. graph export "lowess curve.pdf", replace; (file lowess curve.pdf written in PDF format) > * Estimate the same model, but > * use robust standard errors > * which correct for the inherent > * heteroskedasticity of the linear > * probability model
Page 5. regress vote92 ideol party black, vce(robust); Linear regression Number of obs = 48 F(3, 44) = 25.21 Prob > F = 0.0000 R-squared = 0.5015 Root MSE =.35268 Robust vote92 Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- ideol.0232403.0051574 4.51 0.000.0128463.0336343 party.0162446.0030113 5.39 0.000.0101757.0223136 black -.0056811.0048789-1.16 0.251 -.015514.0041517 _cons.9568955.0802364 11.93 0.000.7951897 1.118601 > * Calculate percent correctly predicted and > * correlation between actual and predicted > * values of dep variable as > * alternative goodness of fit measures. generate dichot_yhat = yhat;. replace dichot_yhat = 0 if yhat <= 0.5; (12 real changes made). replace dichot_yhat = 1 if yhat > 0.5; (36 real changes made). tabulate vote92 dichot_yhat, > cell chi2 ; +-----------------+ Key ----------------- frequency cell percentage +-----------------+ Clinton won state Electoral College dichot_yhat vote 0 1 Total -----------+----------------------+---------- 0 11 6 17 22.92 12.50 35.42 -----------+----------------------+---------- 1 1 30 31 2.08 62.50 64.58 -----------+----------------------+---------- Total 12 36 48 25.00 75.00 100.00 Pearson chi2(1) = 22.1328 Pr = 0.000
Page 6. correlate dichot_yhat vote92; (obs=48) dichot~t vote92 -------------+------------------ dichot_yhat 1.0000 vote92 0.6790 1.0000. display 0.6790^2;.461041. log close; - Figure 1: Scatterplot of dichotomous Clinton won state variable versus the best-fitting linear combination of state partisanship, ideology, and race (obtained from the OLS estimates). The data points are jittered in the vertical direction to reduce overplotting problems, and the bivariate OLS regression line is shown. Linear prediction Clinton received state EC vote in 1992 -.5 0.5 1 1.5 -.5 0.5 1 1.5 0.957 + 0.016*Party + 0.023*Ideol -0.006*Black
Page 7 Figure 2: Residual plot from linear probability model of Clinton voting in 1992 Electoral College. A loess curve is fitted to the data. Residuals -.5 0.5 0 1 2 Predicted values Figure 3: Plot of the dichotomous dependent variable (jittered) versus the predicted values from the linear probability model (OLS estimates) with a nonparametric loess curve superimposed over data. Linear prediction Clinton received state EC vote in 1992 0.2.4.6.8 1 -.5 0.5 1 1.5 0.957 + 0.016*Party + 0.023*Ideol -0.006*Black