THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE

Similar documents
A New Method of Using Polytomous Independent Variables with Many Levels for the Binary Outcome of Big Data Analysis

Telecommunications and Internet Access By Schools & School Districts

DEPARTMENT OF HOUSING AND URBAN DEVELOPMENT. [Docket No. FR-6090-N-01]

Panelists. Patrick Michael. Darryl M. Bloodworth. Michael J. Zylstra. James C. Green

Figure 1 Map of US Coast Guard Districts... 2 Figure 2 CGD Zip File Size... 3 Figure 3 NOAA Zip File Size By State...

CostQuest Associates, Inc.

The Lincoln National Life Insurance Company Universal Life Portfolio

Bivariate (Simple) Regression Analysis

MAKING MONEY FROM YOUR UN-USED CALLS. Connecting People Already on the Phone with Political Polls and Research Surveys. Scott Richards CEO

Ocean Express Procedure: Quote and Bind Renewal Cargo

Fall 2007, Final Exam, Data Structures and Algorithms

2018 NSP Student Leader Contact Form

B.2 Measures of Central Tendency and Dispersion

Accommodating Broadband Infrastructure on Highway Rights-of-Way. Broadband Technology Opportunities Program (BTOP)

Department of Business and Information Technology College of Applied Science and Technology The University of Akron

Distracted Driving- A Review of Relevant Research and Latest Findings

NSA s Centers of Academic Excellence in Cyber Security

State IT in Tough Times: Strategies and Trends for Cost Control and Efficiency

Global Forum 2007 Venice

IT Modernization in State Government Drivers, Challenges and Successes. Bo Reese State Chief Information Officer, Oklahoma NASCIO President

Presented on July 24, 2018

2018 Supply Cheat Sheet MA/PDP/MAPD

Silicosis Prevalence Among Medicare Beneficiaries,

MERGING DATAFRAMES WITH PANDAS. Appending & concatenating Series

Post Graduation Survey Results 2015 College of Engineering Information Networking Institute INFORMATION NETWORKING Master of Science

DSC 201: Data Analysis & Visualization

Tina Ladabouche. GenCyber Program Manager

CSE 781 Data Base Management Systems, Summer 09 ORACLE PROJECT

Geographic Accuracy of Cell Phone RDD Sample Selected by Area Code versus Wire Center

AASHTO s National Transportation Product Evaluation Program

CIS 467/602-01: Data Visualization

Moonv6 Update NANOG 34

Team Members. When viewing this job aid electronically, click within the Contents to advance to desired page. Introduction... 2

Best Practices in Rapid Deployment of PI Infrastructure and Integration with OEM Supplied SCADA Systems

Contact Center Compliance Webinar Bringing you the ANSWERS you need about compliance in your call center.

2015 DISTRACTED DRIVING ENFORCEMENT APRIL 10-15, 2015

What Did You Learn? Key Terms. Key Concepts. 68 Chapter P Prerequisites

Homework Assignment #5

Name: Business Name: Business Address: Street Address. Business Address: City ST Zip Code. Home Address: Street Address

The Outlook for U.S. Manufacturing

Charter EZPort User Guide

Sideseadmed (IRT0040) loeng 4/2012. Avo

Presentation Outline. Effective Survey Sampling of Rare Subgroups Probability-Based Sampling Using Split-Frames with Listed Households

An Introductory Guide to Stata

PulseNet Updates: Transitioning to WGS for Reference Testing and Surveillance

Touch Input. CSE 510 Christian Holz Microsoft Research February 11, 2016

Prizm. manufactured by. White s Electronics, Inc Pleasant Valley Road Sweet Home, OR USA. Visit our site on the World Wide Web

Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records

A Capabilities Presentation

Amy Schick NHTSA, Occupant Protection Division April 7, 2011

ACCESS PROCESS FOR CENTRAL OFFICE ACCESS

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%

DTFH61-13-C Addressing Challenges for Automation in Highway Construction

State HIE Strategic and Operational Plan Emerging Models. February 16, 2011

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

Lab 2: OLS regression

After opening Stata for the first time: set scheme s1mono, permanently

PubHlth 640 Intermediate Biostatistics Unit 2 - Regression and Correlation. Simple Linear Regression Software: Stata v 10.1

Week 10: Heteroskedasticity II

Expanding Transmission Capacity: Options and Implications. What role does renewable energy play in driving transmission expansion?

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian. Panel Data Analysis: Fixed Effects Models

texdoc 2.0 An update on creating LaTeX documents from within Stata Example 2

CMPE 180A Data Structures and Algorithms in C++ Spring 2018

Creating LaTeX and HTML documents from within Stata using texdoc and webdoc. Example 2

Stata versions 12 & 13 Week 4 Practice Problems

Presentation to NANC. January 22, 2003

Week 4: Simple Linear Regression III

Steve Stark Sales Executive Newcastle

Strengthening connections today, while building for tomorrow. Wireless broadband, small cells and 5G

Data Visualization (CIS/DSC 468)

Real World Algorithms: A Beginners Guide Errata to the First Printing

How to Make an Impressive Map of the United States with SAS/Graph for Beginners Sharon Avrunin-Becker, Westat, Rockville, MD

Configuring Oracle GoldenGate OGG 11gR2 local integrated capture and using OGG for mapping and transformations

Today s Lecture. Factors & Sampling. Quick Review of Last Week s Computational Concepts. Numbers we Understand. 1. A little bit about Factors

Medium voltage Marketing contacts

/23/2004 TA : Jiyoon Kim. Recitation Note 1

optimization_machine_probit_bush106.c

Week 9: Modeling II. Marcelo Coca Perraillon. Health Services Research Methods I HSMP University of Colorado Anschutz Medical Campus

schooling.log 7/5/2006

2013 Product Catalog. Quality, affordable tax preparation solutions for professionals Preparer s 1040 Bundle... $579

Jurisdictional Guidelines for Accepting a UCC Record Presented for Filing 2010 Amendments & the 2011 IACA Forms

ASR Contact and Escalation Lists

On All Forms. Financing Statement (Form UCC1) Statutory, MARS or Other Regulatory Authority to Deviate

National Continuity Programs

Selling Compellent Hardware: Controllers, Drives, Switches and HBAs Chad Thibodeau

Stata Session 2. Tarjei Havnes. University of Oslo. Statistics Norway. ECON 4136, UiO, 2012

Pro look sports football guide

Awards Made for Solicitation

Introduction to STATA 6.0 ECONOMICS 626

piecewise ginireg 1 Piecewise Gini Regressions in Stata Jan Ditzen 1 Shlomo Yitzhaki 2 September 8, 2017

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

OPERATOR CERTIFICATION

Introduction to Stata: An In-class Tutorial

Azure App Service. Jorge D. Wong Technical Account Manager Microsoft Azure. Ingram Micro Inc.

1 Introducing Stata sample session

Minitab 17 commands Prepared by Jeffrey S. Simonoff

STATE DATA BREACH NOTIFICATION LAWS OVERVIEW OF REQUIREMENTS FOR RESPONDING TO A DATA BREACH UPDATED JUNE 2017

Week 11: Interpretation plus

Introduction to Programming in Stata

Landline and Cell Phone Response Measures in Behavioral Risk Factor Surveillance System

Transcription:

PLS 802 Spring 2018 Professor Jacoby THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE This handout shows the log of a Stata session that performs least-squares regression analysis on some data about state Electoral College votes in the 1992 U.S. presidential election. The model is estimated two ways: First, using ordinary least squares; second, using OLS with robust standard errors. - (FIRST FEW LINES OMITTED) > * Retrieve the Clinton state voting data. use clint92; > * Describe the contents of the dataset, > * and calculate summary statistics. describe; Contains data from clint92.dta obs: 48 vars: 5 14 Apr 2009 10:09 size: 864 - storage display value variable name type format label variable label - state str2 %9s State name ideol float %9.0g Ideology of state electorate party float %9.0g Partisanship of state electorate black float %9.0g Pct African American vote92 float %9.0g Clinton won state Electoral College vote - Sorted by:. summarize; Variable Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- state 0 ideol 48-11.76667 10.0144-31.7 16.4 party 48 1.110417 14.43695-48.8 30.1 black 48 9.793813 9.341016.25 35.562 vote92 48.6458333.4833211 0 1 > * Use OLS to estimate the coefficients of > * a regression equation with clinton vote as the > * dependent variable, electorate ideology, electorate > * partisanship, and pct African American > * as independent variables

Page 2. regress vote92 ideol party black; Source SS df MS Number of obs = 48 -------------+---------------------------------- F(3, 44) = 14.76 Model 5.50624401 3 1.83541467 Prob > F = 0.0000 Residual 5.47292266 44.124384606 R-squared = 0.5015 -------------+---------------------------------- Adj R-squared = 0.4675 Total 10.9791667 47.233599291 Root MSE =.35268 vote92 Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- ideol.0232403.0055294 4.20 0.000.0120965.0343841 party.0162446.0038644 4.20 0.000.0084563.0240329 black -.0056811.0063082-0.90 0.373 -.0183944.0070322 _cons.9568955.087244 10.97 0.000.7810669 1.132724 > * Standardize independent variables and > * re-estimate model. egen ideol2 = std(ideol);. egen party2 = std(party);. egen black2 = std(black);. regress vote92 ideol2 party2 black2; Source SS df MS Number of obs = 48 -------------+---------------------------------- F(3, 44) = 14.76 Model 5.50624396 3 1.83541465 Prob > F = 0.0000 Residual 5.47292271 44.124384607 R-squared = 0.5015 -------------+---------------------------------- Adj R-squared = 0.4675 Total 10.9791667 47.233599291 Root MSE =.35268 vote92 Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- ideol2.2327376.0553736 4.20 0.000.1211394.3443358 party2.2345228.0557908 4.20 0.000.1220839.3469618 black2 -.0530673.0589248-0.90 0.373 -.1718224.0656877 _cons.6458333.0509053 12.69 0.000.5432405.7484262 > * Obtain predicted values, sort dataset, and print data. predict yhat, xb;. sort yhat;. list state ideol party black > vote92 yhat; +-----------------------------------------------------+ state ideol party black vote92 yhat 1. WY -18.6-48.8.881 0 -.2731164 2. UT -24.4-32.2.696 0 -.1371984 3. MS -31.7-4.5 35.562 0 -.0549541 4. ID -21.3-26.298 0.0378242 5. SC -26.7 -.7 29.825 0.1555694

Page 3 6. SD -19.4-16.1.431 0.2420469 7. VA -17.8-8.9 18.797 0.2918534 8. AL -23.5 2.9 25.266 0.3143192 9. AZ -13-13.3 3.029 0.4215102 10. NH -18.2-6.4.631 1.4263718 11. NE -10.5-14 3.612 0.4649276 12. NC -23 10.7 21.964 0.4714064 13. TX -20.4 5.5 11.903 0.5045167 14. OK -26.9 15.6 7.438 0.5428917 15. OR -12.7-5.4 1.619 1.5648251 16. FL -12.9.7 13.603 0.5911869 17. IN -13.1 -.8 7.792 0.5951848 18. VT -7.3-10.9.355 1.6081582 19. KS -13.6.6 5.771 0.6177886 20. MI -7-4.4 13.9 1.6437697 21. AR -27.7 26.7 15.908 1.6564957 22. CT -9.1-2.2 8.336 1.662313 23. NJ -5.8-4.4 13.415 1.6744134 24. MO -13.1 5.2 10.709 1.6760807 25. TN -18.1 14.2 15.952 1.6762948 26. ND -6.7-6.8.626 0.6871658 27. IL -8 1.3 14.819 1.7079028 28. WI -12.5 4.6 5.008 1.7126661 29. OH -9.7 3.1 10.648 1.7213306 30. LA -20.6 26.1 30.782 1.7272542 31. GA -15.7 18 26.968 1.731218 32. PA -10 4.2 9.174 1.7406015 33. MT -16 10.8.25 1.7590724 34. NV.7-9 6.572 1.7896259 35. WV -13.9 13 3.123 1.8272934 36. CO -5.7 1.9 4.038 1.8323503 37. MN -6.6 3.1 2.171 1.8415342 38. NM -5.8 2.9 1.98 1.8579626 39. CA -2.7 2.3 7.423 1.8893385 40. NY -2.2 5.3 15.892 1.9015792 41. WA -2.9 2.1 3.082 1.9061032 42. MD -6.3 18.5 24.89 1.9696044 43. IA -4.5 10 1.728 1 1.004943 44. KY -16.3 30.1 7.137 1 1.026496 45. DE 16.4-9.1 16.817 1 1.094671 46. ME 2.3 9.2.407 1 1.157486 47. MA 2.3 15.8 4.987 1 1.238682 48. RI 15.4 12.8 3.888 1 1.500639 +-----------------------------------------------------+ > * Graph dependent variable against fitted > * values, and add OLS line. Save graph > * to file, creating Figure 1. twoway (scatter vote92 yhat, > scheme(s1color) > jitter(3) > msymbol(oh) > mcolor(black) > ysize(4.5) > xsize(4.5) > xaxis(1 2) yaxis (1 2)

Page 4 > xtitle("0.957 + 0.016*Party + 0.023*Ideol -0.006*Black", axis(1)) > ytitle("clinton received state EC vote in 1992", axis(1)) > ylabel(, axis(1) nogrid) > ylabel(, axis(2) nolabel) > xlabel(, axis(2) nolabel) > legend(off) > ) > (lfit vote92 yhat) > ;. graph export "lpm fit.pdf", replace; (file lpm fit.pdf written in PDF format) > * Obtain residuals, construct > * residual plot, and save. > * This creates Figure 2.. predict esubi, residuals;. graph twoway > scatter esubi yhat, > ; (GRAPH OPTIONS OMITTED). graph export resid1.pdf, replace; (file resid1.pdf written in PDF format) > * Create graph of dependent variable against > * fitted values, with nonparametric regression > * line superimposed over points. Save graph > * to external file, creating Figure 3.. twoway (scatter vote92 yhat, > scheme(s1color) > jitter(3) > msymbol(oh) > mcolor(black) > ysize(4.5) > xsize(4.5) > xaxis(1 2) yaxis (1 2) > xtitle("0.957 + 0.016*Party + 0.023*Ideol -0.006*Black", axis(1)) > ytitle("clinton received state EC vote in 1992", axis(1)) > ylabel(, axis(1) nogrid) > ylabel(, axis(2) nolabel) > xlabel(, axis(2) nolabel) > legend(off) > ) > (lowess vote92 yhat) > ;. graph export "lowess curve.pdf", replace; (file lowess curve.pdf written in PDF format) > * Estimate the same model, but > * use robust standard errors > * which correct for the inherent > * heteroskedasticity of the linear > * probability model

Page 5. regress vote92 ideol party black, vce(robust); Linear regression Number of obs = 48 F(3, 44) = 25.21 Prob > F = 0.0000 R-squared = 0.5015 Root MSE =.35268 Robust vote92 Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- ideol.0232403.0051574 4.51 0.000.0128463.0336343 party.0162446.0030113 5.39 0.000.0101757.0223136 black -.0056811.0048789-1.16 0.251 -.015514.0041517 _cons.9568955.0802364 11.93 0.000.7951897 1.118601 > * Calculate percent correctly predicted and > * correlation between actual and predicted > * values of dep variable as > * alternative goodness of fit measures. generate dichot_yhat = yhat;. replace dichot_yhat = 0 if yhat <= 0.5; (12 real changes made). replace dichot_yhat = 1 if yhat > 0.5; (36 real changes made). tabulate vote92 dichot_yhat, > cell chi2 ; +-----------------+ Key ----------------- frequency cell percentage +-----------------+ Clinton won state Electoral College dichot_yhat vote 0 1 Total -----------+----------------------+---------- 0 11 6 17 22.92 12.50 35.42 -----------+----------------------+---------- 1 1 30 31 2.08 62.50 64.58 -----------+----------------------+---------- Total 12 36 48 25.00 75.00 100.00 Pearson chi2(1) = 22.1328 Pr = 0.000

Page 6. correlate dichot_yhat vote92; (obs=48) dichot~t vote92 -------------+------------------ dichot_yhat 1.0000 vote92 0.6790 1.0000. display 0.6790^2;.461041. log close; - Figure 1: Scatterplot of dichotomous Clinton won state variable versus the best-fitting linear combination of state partisanship, ideology, and race (obtained from the OLS estimates). The data points are jittered in the vertical direction to reduce overplotting problems, and the bivariate OLS regression line is shown. Linear prediction Clinton received state EC vote in 1992 -.5 0.5 1 1.5 -.5 0.5 1 1.5 0.957 + 0.016*Party + 0.023*Ideol -0.006*Black

Page 7 Figure 2: Residual plot from linear probability model of Clinton voting in 1992 Electoral College. A loess curve is fitted to the data. Residuals -.5 0.5 0 1 2 Predicted values Figure 3: Plot of the dichotomous dependent variable (jittered) versus the predicted values from the linear probability model (OLS estimates) with a nonparametric loess curve superimposed over data. Linear prediction Clinton received state EC vote in 1992 0.2.4.6.8 1 -.5 0.5 1 1.5 0.957 + 0.016*Party + 0.023*Ideol -0.006*Black