Bivariate (Simple) Regression Analysis

Similar documents
Scatterplots. This handout focuses mainly on the Stata menu approach to obtaining scatterplots, but display equivalent command-line language.

piecewise ginireg 1 Piecewise Gini Regressions in Stata Jan Ditzen 1 Shlomo Yitzhaki 2 September 8, 2017

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE

Lab 2: OLS regression

Week 4: Simple Linear Regression III

Panel Data 4: Fixed Effects vs Random Effects Models

Linear regression Number of obs = 6,866 F(16, 326) = Prob > F = R-squared = Root MSE =

Stata versions 12 & 13 Week 4 Practice Problems

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

Week 11: Interpretation plus

CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

schooling.log 7/5/2006

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

An Introductory Guide to Stata

Stata Session 2. Tarjei Havnes. University of Oslo. Statistics Norway. ECON 4136, UiO, 2012

Multiple Regression White paper

Week 4: Simple Linear Regression II

Week 10: Heteroskedasticity II

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian. Panel Data Analysis: Fixed Effects Models

/23/2004 TA : Jiyoon Kim. Recitation Note 1

Week 5: Multiple Linear Regression II

Introduction to Stata: An In-class Tutorial

texdoc 2.0 An update on creating LaTeX documents from within Stata Example 2

Creating LaTeX and HTML documents from within Stata using texdoc and webdoc. Example 2

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

Stat 5100 Handout #6 SAS: Linear Regression Remedial Measures

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

PubHlth 640 Intermediate Biostatistics Unit 2 - Regression and Correlation. Simple Linear Regression Software: Stata v 10.1

25 Working with categorical data and factor variables

Instruction on JMP IN of Chapter 19

Recoding and Labeling Variables

Source:

MODULE ONE, PART THREE: READING DATA INTO STATA, CREATING AND RECODING VARIABLES, AND ESTIMATING AND TESTING MODELS IN STATA

Conditional and Unconditional Regression with No Measurement Error

Introduction to Programming in Stata

Year 10 General Mathematics Unit 2

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

xsmle - A Command to Estimate Spatial Panel Data Models in Stata

Week 9: Modeling II. Marcelo Coca Perraillon. Health Services Research Methods I HSMP University of Colorado Anschutz Medical Campus

Review of Stata II AERC Training Workshop Nairobi, May 2002

1. Solve the following system of equations below. What does the solution represent? 5x + 2y = 10 3x + 5y = 2

An introduction to SPSS

Robust Linear Regression (Passing- Bablok Median-Slope)

1 Downloading files and accessing SAS. 2 Sorting, scatterplots, correlation and regression

Stata Training. AGRODEP Technical Note 08. April Manuel Barron and Pia Basurto

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1

Regression Analysis and Linear Regression Models

Subset Selection in Multiple Regression

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

Normal Plot of the Effects (response is Mean free height, Alpha = 0.05)

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

Stat 5100 Handout #19 SAS: Influential Observations and Outliers

Reproducible Research: Weaving with Stata

May 24, Emil Coman 1 Yinghui Duan 2 Daren Anderson 3

1 Introducing Stata sample session

Stat 401 B Lecture 26

Example Using Missing Data 1

optimization_machine_probit_bush106.c

6:1 LAB RESULTS -WITHIN-S ANOVA

STATISTICS FOR PSYCHOLOGISTS

Lecture 3: The basic of programming- do file and macro

Introduction to STATA 6.0 ECONOMICS 626

Two-Stage Least Squares

Chapter 7: Linear regression

Package DBfit. October 25, 2018

SLStats.notebook. January 12, Statistics:

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

5.5 Regression Estimation

ECON Introductory Econometrics Seminar 4

SPSS QM II. SPSS Manual Quantitative methods II (7.5hp) SHORT INSTRUCTIONS BE CAREFUL

Section E. Measuring the Strength of A Linear Association

Instrumental variables, bootstrapping, and generalized linear models

STAT:5201 Applied Statistic II

Outline. Topic 16 - Other Remedies. Ridge Regression. Ridge Regression. Ridge Regression. Robust Regression. Regression Trees. Piecewise Linear Model

Regression on the trees data with R

Centering and Interactions: The Training Data

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Multivariate Analysis Multivariate Calibration part 2

Compare Linear Regression Lines for the HP-67

After opening Stata for the first time: set scheme s1mono, permanently

1. What specialist uses information obtained from bones to help police solve crimes?

Fathom Dynamic Data TM Version 2 Specifications

Chapter 7 Heteroscedastic Tobit Model Code

Health Disparities (HD): It s just about comparing two groups

Statistics Lab #7 ANOVA Part 2 & ANCOVA

Introduction to SAS proc calis

The x-intercept can be found by setting y = 0 and solving for x: 16 3, 0

Introduction to Computing for Sociologists Neustadtl

Conducting a Path Analysis With SPSS/AMOS

Simulating Multivariate Normal Data

PAM 4280/ECON 3710: The Economics of Risky Health Behaviors Fall 2015 Professor John Cawley TA Christine Coyer. Stata Basics for PAM 4280/ECON 3710

An Example of Using inter5.exe to Obtain the Graph of an Interaction

Data Analysis Multiple Regression

CSC 328/428 Summer Session I 2002 Data Analysis for the Experimenter FINAL EXAM

Regression. Page 1. Notes. Output Created Comments Data. 26-Mar :31:18. Input. C:\Documents and Settings\BuroK\Desktop\Data Sets\Prestige.

Transcription:

Revised July 2018 Bivariate (Simple) Regression Analysis This set of notes shows how to use Stata to estimate a simple (two-variable) regression equation. It assumes that you have set Stata up on your computer (see the Getting Started with Stata handout), and that you have read in the set of data that you want to analyze (see the Reading in Stata Format (.dta) Data Files handout). In Stata, most tasks can be performed either by issuing commands within the Stata command window, or by using the menus. These notes illustrate both approaches, using the data file GSS2016.DTA (this data file is posted here: https://canvas.harvard.edu/courses/53958). To estimate the linear regression of a response variable on an explanatory variable, issue the following command: regress <depvar> <indepvar> Where you fill in the name of your response variable in place of "<depvar>" and the name of your explanatory variable in place of "<indepvar>". The response variable must always be listed first in the list (in multiple regressions later on, you will add additional explanatory variables to the command). For a regression of vocabulary test score on educational attainment, the command would be: regress wordsum educ Using the Stata menus, you can estimate a simple regression as follows: click on "Statistics" click on "Linear models and related" click on "Linear regression" 1

A window like this will open up: Fill in the name of your response variable in the "Dependent variable:" box and the name of your explanatory variable in the "Independent variables:" box, and click "OK". Whether via the menus or via the command line approach, the following output will appear in the Stata "Results" window:. regress wordsum educ Source SS df MS Number of obs = 1,861 -------------+---------------------------------- F(1, 1859) = 454.69 Model 1348.95828 1 1348.95828 Prob > F = 0.0000 Residual 5515.22442 1,859 2.96676946 R-squared = 0.1965 -------------+---------------------------------- Adj R-squared = 0.1961 Total 6864.1827 1,860 3.69042081 Root MSE = 1.7224 wordsum Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- educ.2972705.013941 21.32 0.000.2699288.3246122 _cons 1.941593.1954309 9.93 0.000 1.558306 2.32488 The estimates that describe the regression line are in the "Coef." column in the bottom panel of this chart; the number beside "_cons" is the "constant" or "intercept" a, here 1.94; the number beside the independent variable (educ) is the "regression coefficient" or "slope" b, here 0.297. The coefficient of determination is reported in the upper right panel as "R-squared" and the standard error of estimate (or residual standard deviation) is reported as "Root MSE". 2

At the upper left is an analysis of variance table that leads to the F statistic reported at the upper right. At the bottom are standard errors and t-statistics for the slope and intercept, as well as 95% confidence intervals for those statistics. If you want Stata to print the standardized (beta) coefficients, select the "Reporting" tab of the above screen and check the box for "Standardized beta coefficients" there, before clicking OK. Alternatively, add the "beta" option to the command-line version of this command, as follows for this example: regress wordsum educ, beta Either way, you will see the following output:. regress wordsum educ, beta Source SS df MS Number of obs = 1,861 -------------+---------------------------------- F(1, 1859) = 454.69 Model 1348.95828 1 1348.95828 Prob > F = 0.0000 Residual 5515.22442 1,859 2.96676946 R-squared = 0.1965 -------------+---------------------------------- Adj R-squared = 0.1961 Total 6864.1827 1,860 3.69042081 Root MSE = 1.7224 wordsum Coef. Std. Err. t P> t Beta -------------+---------------------------------------------------------------- educ.2972705.013941 21.32 0.000.4433073 _cons 1.941593.1954309 9.93 0.000. As you can see, this is identical to the output on the previous page, except that the "Beta" coefficient (always equal to the Pearson correlation, for bivariate regression) appears in place of the confidence interval. Predicted values of the dependent variable based on a regression are often useful. To obtain these, you issue the following command immediately after you estimate the regression: predict pred, xb This will calculate the predicted value of the dependent variable for each observation and store it as a new variable named "pred" (you can select a different variable name than "pred" if you want). Alternately, you can use the "Postestimation" menu to do this, again immediately after you estimate the regression. Click on "Statistics" Click on "Post-estimation" Click on the arrow next to "Predictions Select Predictions and their SEs Click Launch 3

The screen shown below will appear: 4

Fill in the name of the variable that is to store the predicted values in the "New variable name:" box. Leave the "Linear prediction (xb)" button in the "Produce:" box checked. Then click "OK". Either way (command line or menus), you will see little if any output in the Stata Results window. If there are any missing values on the new variable, you will be told so. For example: (9 missing values generated) You will see a new variable name in the "Variables" window at the right of the Stata screen, however. Henceforth you can refer to this ("pred", in this case) just like any other variable in the data set, when executing other Stata commands. 5