INTRODUCTION TO PANEL DATA ANALYSIS

Size: px
Start display at page:

Download "INTRODUCTION TO PANEL DATA ANALYSIS"

Transcription

1 INTRODUCTION TO PANEL DATA ANALYSIS USING EVIEWS FARIDAH NAJUNA MISMAN, PhD FINANCE DEPARTMENT FACULTY OF BUSINESS & MANAGEMENT UiTM JOHOR PANEL DATA WORKSHOP-23&24 MAY

2 OUTLINE 1. Introduction 2. CLRM Assumptions 3. Static Panel Data Models 4. Getting Start with EViews 9 5. Data Analysis 6. Reading The Results PANEL DATA WORKSHOP-23&24 MAY

3 1. INTRODUCTION There are 3 types of data structure available: 1. Time Series data is data that is collected at regular time intervals such as every month or every year. (N=1, t=1 T) Usually this represents the values for a single firm or a single variable at different points in time. Most macroeconomic data for real variables e.g. GDP or Consumption, is quarterly time series data. The data for monetary variables such as Interest rates is often monthly time series data. 2. Cross sectional data is data associated with the values of many different firms or households that is collected at a single point in time. (i=1 N, T=1) 3. Panel data is a combination of the other two where we have values for all members of a panel or group of firms or households measured at more than one period in time. (i=1..n, t=1 T) PANEL DATA WORKSHOP-23&24 MAY

4 1. INTRODUCTION Classical panel data: N>T or known as short or micro panel Macro panel: T>N or known as long panel Balanced panel : data available for all cross section for all periods. No of observation: n = NT Unbalanced panel : different T for individual. (notes: Eviews cannot read unbalanced panel) PANEL DATA WORKSHOP-23&24 MAY

5 1. INTRODUCTION Selection of econometric models will depend o type of data: 1. Least Squares Regression: Normally applied to cross-section data set (e.g Ordinary Least Squares, OLS) 2. Time-series Model: Normally applied to time series data, to uncover long run relations and short run dynamics. 3. Panel Data Modelling: Normally used to capture heterogeneity across samples and due to the need to have bigger sample size. Statics Panel data model : POLS, FE, RE, BE Dynamic panel data: GMM Panel unit root and cointegration (macro panel) PANEL DATA WORKSHOP-23&24 MAY

6 1. INTRODUCTION Advantages & Disadvantages Panel Data allow us to control for variables you cannot observe or measure such as: Time-invariant factors like geographical area, firm management characteristics. Variables that change over time but not across entities like national policies, federal regulation, international agreements. In other word, panel data is able to take into account for individual heterogeneity (uniqueness)- resulted efficient estimates PANEL DATA WORKSHOP-23&24 MAY

7 1. INTRODUCTION Advantages: i. Larger sample size, more variation, less collinearity therefore it will increased precision of estimates ii. iii. Ability to study the dynamic- repeated cross-sectional observations-adjustment over times Ability to account for heterogeneity across individual often ignored in pooled data-more robust against misspecification due to omitted variable Disadvantages: i. Data availibity/maintenance ii. iii. Measurement errors Elf-selection bias PANEL DATA WORKSHOP-23&24 MAY

8 1. INTRODUCTION Why Analyse Panel Data? We are interested in describing change over time o social change, e.g. changing attitudes, behaviours, social relationships o individual growth or development, e.g. life-course studies, child development, career trajectories, school achievement o occurrence (or non-occurrence) of events We want superior estimates trends in social phenomena o Panel models can be used to inform policy e.g. health, obesity o Multiple observations on each unit can provide superior estimates as compared to cross-sectional models of association We want to estimate causal models o Policy evaluation o Estimation of treatment effects PANEL DATA WORKSHOP-23&24 MAY

9 1. INTRODUCTION What kind of data are required for panel analysis? Basic panel methods require at least two waves of measurement. Consider student GPAs and job hours during two semesters of college One way to organize the panel data is to create a single record for each combination of unit and time period Notice that the data include: A time-invariant unique identifier for each unit (StudentID) A time-varying outcome (GPA) An indicator for time (Semester) Panel datasets can include other time-varying or time-invariant variables PANEL DATA WORKSHOP-23&24 MAY

10 2.CLASSICAL LINEAR REGRESSION MODEL (CLRM) Table taken from page 37, Applied Econometrics:, Asteriou & Hall, 2 nd PANEL DATA WORKSHOP-23&24 MAY ed. 2011, Palgrave Macmillan

11 3. PANEL DATA MODEL: POOLED OLS Pooled OLS yit = β0 + βit Xit + αi + νit i. α i and v it are normally distributed and they are mutually independent, ii. E(αi) = E(vij) = 0, for i = 1,...,m, j = 1,2,...,m(i), iii. E( α iαi ) = 2 1 0,, i i otherwise, iv. E(v ij v i j ) = 2 2 0,, i i, j j otherwise. PANEL DATA WORKSHOP-23&24 MAY

12 4.GETTING START WITH EViews 9 PANEL DATA WORKSHOP-23&24 MAY

13 PANEL DATA WORKSHOP-23&24 MAY

14 PANEL DATA WORKSHOP-23&24 MAY

15 PANEL DATA WORKSHOP-23&24 MAY

16 PANEL DATA WORKSHOP-23&24 MAY

17 PANEL DATA WORKSHOP-23&24 MAY

18 PANEL DATA WORKSHOP-23&24 MAY

19 PANEL DATA WORKSHOP-23&24 MAY

20 PANEL DATA WORKSHOP-23&24 MAY

21 5. DATA ANALYSIS PANEL DATA WORKSHOP-23&24 MAY

22 DESCRIPTIVE STATISTICS PANEL DATA WORKSHOP-23&24 MAY

23 PANEL DATA WORKSHOP-23&24 MAY

24 CORRELATION ANALYSIS PANEL DATA WORKSHOP-23&24 MAY

25 PANEL DATA WORKSHOP-23&24 MAY

26 PANEL DATA WORKSHOP-23&24 MAY

27 PANEL DATA WORKSHOP-23&24 MAY

28 POOLED OLS REGRESSION PANEL DATA WORKSHOP-23&24 MAY

29 PANEL DATA WORKSHOP-23&24 MAY

30 PANEL DATA WORKSHOP-23&24 MAY

31 PANEL DATA WORKSHOP-23&24 MAY

32 NORMALITY TEST PANEL DATA WORKSHOP-23&24 MAY

33 PANEL DATA WORKSHOP-23&24 MAY

34 PANEL DATA WORKSHOP-23&24 MAY

35 DUMMY VARIABLES PANEL DATA WORKSHOP-23&24 MAY

36 PANEL DATA WORKSHOP-23&24 MAY

37 PANEL DATA WORKSHOP-23&24 MAY

38 PANEL DATA WORKSHOP-23&24 MAY

39 PANEL DATA WORKSHOP-23&24 MAY

40 PANEL DATA WORKSHOP-23&24 MAY

41 6.READING THE RESULTS Dependent Variable: CR Method: Panel Least Squares Date: 05/23/17 Time: 17:06 Sample (adjusted): Periods included: 16 Cross-sections included: 17 Total panel (unbalanced) observations: 85 Time included Total no of groups n=nt Constant If this no is < 0.05 then the model is ok. This is F test to see whether all coeffs in the model are diff than zero. Variable Coefficient Std. Error t-statistic Prob. C FE FQ CB CAPR R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan-Quinn criter F-statistic Durbin-Watson stat Prob(F-statistic) PANEL DATA WORKSHOP-23&24 MAY

42 Coefficient Std. Error t-statistic Prob. Coefficients of the regressors. Indicate how much Y changes When X increase by one unit T-values test the hypothesis that each coeff is diff from 0 To reject this, the t-value has to be higher than 1.96 (95% confidence interval). If this is the case then you can say that the variables has a significant influence on your DV (Y). The higher the value the higher the relevance of the variable. Two-tail p-values test the hypothesis That each coeff is diff from 0. To reject this, P-value has to be lower than 0.05 (95%). If this is Case the you can say that the variable has a significant influence On you DV (Y) PANEL DATA WORKSHOP-23&24 MAY

43 R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan-Quinn criter F-statistic Durbin-Watson stat Prob(F-statistic) R-squared shows the amount Of variance of Y explained by X Adjusted R-squared shows the same as R-squared but adjusted by the number of cases and number of variables. When the number of variables is small and the number of cases is very large, then Adj R-squared is closer to R- squared PANEL DATA WORKSHOP-23&24 MAY

An Econometric Study: The Cost of Mobile Broadband

An Econometric Study: The Cost of Mobile Broadband An Econometric Study: The Cost of Mobile Broadband Zhiwei Peng, Yongdon Shin, Adrian Raducanu IATOM13 ENAC January 16, 2014 Zhiwei Peng, Yongdon Shin, Adrian Raducanu (UCLA) The Cost of Mobile Broadband

More information

Lab Session 1. Introduction to Eviews

Lab Session 1. Introduction to Eviews Albert-Ludwigs University Freiburg Department of Empirical Economics Time Series Analysis, Summer 2009 Dr. Sevtap Kestel To see the data of m1: 1 Lab Session 1 Introduction to Eviews We introduce the basic

More information

Model Diagnostic tests

Model Diagnostic tests Model Diagnostic tests 1. Multicollinearity a) Pairwise correlation test Quick/Group stats/ correlations b) VIF Step 1. Open the EViews workfile named Fish8.wk1. (FROM DATA FILES- TSIME) Step 2. Select

More information

LOADS, CUSTOMERS AND REVENUE

LOADS, CUSTOMERS AND REVENUE EB04-06 Corrected: 0 Jan Page of LOADS, CUSTOMERS AND REVENUE 4 Toronto Hydro s total load, customer and distribution revenue forecast is summarized in Table. The revenue forecast is calculated based on

More information

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian. Panel Data Analysis: Fixed Effects Models

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian. Panel Data Analysis: Fixed Effects Models SOCY776: Longitudinal Data Analysis Instructor: Natasha Sarkisian Panel Data Analysis: Fixed Effects Models Fixed effects models are similar to the first difference model we considered for two wave data

More information

Chapter 15: Forecasting

Chapter 15: Forecasting Chapter 15: Forecasting In this chapter: 1. Forecasting chicken consumption using OLS (UE 15.1, Equation 6.8, p. 501) 2. Forecasting chicken consumption using a generalized least squares (GLS) model estimated

More information

Adaptive spline autoregression threshold method in forecasting Mitsubishi car sales volume at PT Srikandi Diamond Motors

Adaptive spline autoregression threshold method in forecasting Mitsubishi car sales volume at PT Srikandi Diamond Motors IOP Conference Series: Materials Science and Engineering PAPER OPEN ACCESS Adaptive spline autoregression threshold method in forecasting Mitsubishi car sales volume at PT Srikandi Diamond Motors To cite

More information

Intro to E-Views. E-views is a statistical package useful for cross sectional, time series and panel data statistical analysis.

Intro to E-Views. E-views is a statistical package useful for cross sectional, time series and panel data statistical analysis. Center for Teaching, Research & Learning Research Support Group at the CTRL Lab American University, Washington, D.C. http://www.american.edu/provost/ctrl/ 202-885-3862 Intro to E-Views E-views is a statistical

More information

Week 4: Simple Linear Regression II

Week 4: Simple Linear Regression II Week 4: Simple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Algebraic properties

More information

PANEL DATA REGRESSION MODELS IN EVIEWS: Pooled OLS, Fixed or Random effect model?

PANEL DATA REGRESSION MODELS IN EVIEWS: Pooled OLS, Fixed or Random effect model? PANEL DATA REGRESSION MODELS IN EVIEWS: Pooled OLS, Fixed or Random effect model? ADESETE, Ahmed Adefemi 12/6/2017 2 PANEL DATA REGRESSION MODELS IN EVIEWS: Pooled OLS, Fixed or Random effect model? Panel

More information

EViews 3.1 Student Version

EViews 3.1 Student Version EViews 3.1 Student Version Copyright 1994 1999 Quantitative Micro Software, LLC All Rights Reserved Printed in the United States of America This software product, including program code and manual, is

More information

Week 4: Simple Linear Regression III

Week 4: Simple Linear Regression III Week 4: Simple Linear Regression III Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Goodness of

More information

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1

Example 1 of panel data : Data for 6 airlines (groups) over 15 years (time periods) Example 1 Panel data set Consists of n entities or subjects (e.g., firms and states), each of which includes T observations measured at 1 through t time period. total number of observations : nt Panel data have

More information

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening Variables Entered/Removed b Variables Entered GPA in other high school, test, Math test, GPA, High school math GPA a Variables Removed

More information

Source engine marketing: a preliminary empirical analysis of web search data

Source engine marketing: a preliminary empirical analysis of web search data Source engine marketing: a preliminary empirical analysis of web search data Abstract Bruce Q. Budd Alfaisal University The purpose of this paper is to empirically investigate a website performance and

More information

Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix

Serial Correlation and Heteroscedasticity in Time series Regressions. Econometric (EC3090) - Week 11 Agustín Bénétrix Serial Correlation and Heteroscedasticity in Time series Regressions Econometric (EC3090) - Week 11 Agustín Bénétrix 1 Properties of OLS with serially correlated errors OLS still unbiased and consistent

More information

Analysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS

Analysis of Panel Data. Third Edition. Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS Analysis of Panel Data Third Edition Cheng Hsiao University of Southern California CAMBRIDGE UNIVERSITY PRESS Contents Preface to the ThirdEdition Preface to the Second Edition Preface to the First Edition

More information

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017 Introduction to Hierarchical Linear Model Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017 1 Outline What is Hierarchical Linear Model? Why do nested data create analytic problems? Graphic presentation

More information

Week 5: Multiple Linear Regression II

Week 5: Multiple Linear Regression II Week 5: Multiple Linear Regression II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Adjusted R

More information

Research Methods Workshop Introduction to EViews

Research Methods Workshop Introduction to EViews Research Methods Workshop Introduction to EViews Today s Workshop Brief Introduction to EViews and Basic Operations Descriptive Statistics with EViews Group Statistics with Eviews Note that the data we

More information

BASIC STEPS TO DO A SIMPLE PANEL DATA ANALYSIS IN STATA

BASIC STEPS TO DO A SIMPLE PANEL DATA ANALYSIS IN STATA BASIC STEPS TO DO A SIMPLE PANEL DATA ANALYSIS IN STATA By: Mahyudin Ahmad @ 2017 Basic steps to do a panel data analysis in STATA Page 1 Outline Outline: 1. Setting up commands 2. Importing data to Stata

More information

Session 2: Fixed and Random Effects Estimation

Session 2: Fixed and Random Effects Estimation Session 2: Fixed and Random Effects Estimation Principal, Developing Trade Consultants Ltd. ARTNeT/RIS Capacity Building Workshop on the Use of Gravity Modeling Thursday, November 10, 2011 1 Outline Fixed

More information

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC.

Mixed Effects Models. Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC. Mixed Effects Models Biljana Jonoska Stojkova Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC March 6, 2018 Resources for statistical assistance Department of Statistics

More information

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem.

STAT 2607 REVIEW PROBLEMS Word problems must be answered in words of the problem. STAT 2607 REVIEW PROBLEMS 1 REMINDER: On the final exam 1. Word problems must be answered in words of the problem. 2. "Test" means that you must carry out a formal hypothesis testing procedure with H0,

More information

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education) DUMMY VARIABLES AND INTERACTIONS Let's start with an example in which we are interested in discrimination in income. We have a dataset that includes information for about 16 people on their income, their

More information

EViews 4.1 Tutorial 1. EVIEWS: INTRODUCTION

EViews 4.1 Tutorial 1. EVIEWS: INTRODUCTION EViews 4.1 Tutorial 1. EVIEWS: INTRODUCTION This tutorial will introduce you to a statistical and econometric software package called EViews, or Econometric Views. EViews runs on both the Windows (9x,

More information

Lecture 13: Model selection and regularization

Lecture 13: Model selection and regularization Lecture 13: Model selection and regularization Reading: Sections 6.1-6.2.1 STATS 202: Data mining and analysis October 23, 2017 1 / 17 What do we know so far In linear regression, adding predictors always

More information

An Introduction to Growth Curve Analysis using Structural Equation Modeling

An Introduction to Growth Curve Analysis using Structural Equation Modeling An Introduction to Growth Curve Analysis using Structural Equation Modeling James Jaccard New York University 1 Overview Will introduce the basics of growth curve analysis (GCA) and the fundamental questions

More information

Cluster Randomization Create Cluster Means Dataset

Cluster Randomization Create Cluster Means Dataset Chapter 270 Cluster Randomization Create Cluster Means Dataset Introduction A cluster randomization trial occurs when whole groups or clusters of individuals are treated together. Examples of such clusters

More information

Panel Data 4: Fixed Effects vs Random Effects Models

Panel Data 4: Fixed Effects vs Random Effects Models Panel Data 4: Fixed Effects vs Random Effects Models Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised April 4, 2017 These notes borrow very heavily, sometimes verbatim,

More information

Chapter 7: Dual Modeling in the Presence of Constant Variance

Chapter 7: Dual Modeling in the Presence of Constant Variance Chapter 7: Dual Modeling in the Presence of Constant Variance 7.A Introduction An underlying premise of regression analysis is that a given response variable changes systematically and smoothly due to

More information

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski

Data Analysis and Solver Plugins for KSpread USER S MANUAL. Tomasz Maliszewski Data Analysis and Solver Plugins for KSpread USER S MANUAL Tomasz Maliszewski tmaliszewski@wp.pl Table of Content CHAPTER 1: INTRODUCTION... 3 1.1. ABOUT DATA ANALYSIS PLUGIN... 3 1.3. ABOUT SOLVER PLUGIN...

More information

EViews 6 Tutorial. by Manfred W. Keil. to Accompany. Introduction to Econometrics. by James H. Stock and Mark W. Watson

EViews 6 Tutorial. by Manfred W. Keil. to Accompany. Introduction to Econometrics. by James H. Stock and Mark W. Watson EViews 6 Tutorial by Manfred W. Keil to Accompany Introduction to Econometrics by James H. Stock and Mark W. Watson ---------------------------------------------------------------------------------------------------------------------

More information

Conducting a Path Analysis With SPSS/AMOS

Conducting a Path Analysis With SPSS/AMOS Conducting a Path Analysis With SPSS/AMOS Download the PATH-INGRAM.sav data file from my SPSS data page and then bring it into SPSS. The data are those from the research that led to this publication: Ingram,

More information

Introduction. Advanced Econometrics - HEC Lausanne. Christophe Hurlin. University of Orléans. October 2013

Introduction. Advanced Econometrics - HEC Lausanne. Christophe Hurlin. University of Orléans. October 2013 Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans October 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics - HEC Lausanne October 2013 1 / 27 Instructor Contact

More information

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90% ------------------ log: \Term 2\Lecture_2s\regression1a.log log type: text opened on: 22 Feb 2008, 03:29:09. cmdlog using " \Term 2\Lecture_2s\regression1a.do" (cmdlog \Term 2\Lecture_2s\regression1a.do

More information

Applied Statistics and Econometrics Lecture 6

Applied Statistics and Econometrics Lecture 6 Applied Statistics and Econometrics Lecture 6 Giuseppe Ragusa Luiss University gragusa@luiss.it http://gragusa.org/ March 6, 2017 Luiss University Empirical application. Data Italian Labour Force Survey,

More information

Two-Stage Least Squares

Two-Stage Least Squares Chapter 316 Two-Stage Least Squares Introduction This procedure calculates the two-stage least squares (2SLS) estimate. This method is used fit models that include instrumental variables. 2SLS includes

More information

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242

Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Mean Tests & X 2 Parametric vs Nonparametric Errors Selection of a Statistical Test SW242 Creation & Description of a Data Set * 4 Levels of Measurement * Nominal, ordinal, interval, ratio * Variable Types

More information

Source:

Source: Time Series Source: http://www.princeton.edu/~otorres/stata/ Time series data is data collected over time for a single or a group of variables. Date variable For this kind of data the first thing to do

More information

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017

Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 04, 2017 Bivariate Linear Regression James M. Murray, Ph.D. University of Wisconsin - La Crosse Updated: October 4, 217 PDF file location: http://www.murraylax.org/rtutorials/regression_intro.pdf HTML file location:

More information

Gov Troubleshooting the Linear Model II: Heteroskedasticity

Gov Troubleshooting the Linear Model II: Heteroskedasticity Gov 2000-10. Troubleshooting the Linear Model II: Heteroskedasticity Matthew Blackwell December 4, 2015 1 / 64 1. Heteroskedasticity 2. Clustering 3. Serial Correlation 4. What s next for you? 2 / 64 Where

More information

Week 10: Heteroskedasticity II

Week 10: Heteroskedasticity II Week 10: Heteroskedasticity II Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline Dealing with heteroskedasticy

More information

HLM versus SEM Perspectives on Growth Curve Modeling. Hsueh-Sheng Wu CFDR Workshop Series August 3, 2015

HLM versus SEM Perspectives on Growth Curve Modeling. Hsueh-Sheng Wu CFDR Workshop Series August 3, 2015 HLM versus SEM Perspectives on Growth Curve Modeling Hsueh-Sheng Wu CFDR Workshop Series August 3, 2015 1 Outline What is Growth Curve Modeling (GCM) Advantages of GCM Disadvantages of GCM Graphs of trajectories

More information

An Introductory Guide to Stata

An Introductory Guide to Stata An Introductory Guide to Stata Scott L. Minkoff Assistant Professor Department of Political Science Barnard College sminkoff@barnard.edu Updated: July 9, 2012 1 TABLE OF CONTENTS ABOUT THIS GUIDE... 4

More information

Detecting and Circumventing Collinearity or Ill-Conditioning Problems

Detecting and Circumventing Collinearity or Ill-Conditioning Problems Chapter 8 Detecting and Circumventing Collinearity or Ill-Conditioning Problems Section 8.1 Introduction Multicollinearity/Collinearity/Ill-Conditioning The terms multicollinearity, collinearity, and ill-conditioning

More information

Robust Linear Regression (Passing- Bablok Median-Slope)

Robust Linear Regression (Passing- Bablok Median-Slope) Chapter 314 Robust Linear Regression (Passing- Bablok Median-Slope) Introduction This procedure performs robust linear regression estimation using the Passing-Bablok (1988) median-slope algorithm. Their

More information

- 1 - Fig. A5.1 Missing value analysis dialog box

- 1 - Fig. A5.1 Missing value analysis dialog box WEB APPENDIX Sarstedt, M. & Mooi, E. (2019). A concise guide to market research. The process, data, and methods using SPSS (3 rd ed.). Heidelberg: Springer. Missing Value Analysis and Multiple Imputation

More information

Package endogenous. October 29, 2016

Package endogenous. October 29, 2016 Package endogenous October 29, 2016 Type Package Title Classical Simultaneous Equation Models Version 1.0 Date 2016-10-25 Maintainer Andrew J. Spieker Description Likelihood-based

More information

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY

CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 23 CHAPTER 3 AN OVERVIEW OF DESIGN OF EXPERIMENTS AND RESPONSE SURFACE METHODOLOGY 3.1 DESIGN OF EXPERIMENTS Design of experiments is a systematic approach for investigation of a system or process. A series

More information

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors Week 10: Autocorrelated errors This week, I have done one possible analysis and provided lots of output for you to consider. Case study: predicting body fat Body fat is an important health measure, but

More information

CREATING THE ANALYSIS

CREATING THE ANALYSIS Chapter 14 Multiple Regression Chapter Table of Contents CREATING THE ANALYSIS...214 ModelInformation...217 SummaryofFit...217 AnalysisofVariance...217 TypeIIITests...218 ParameterEstimates...218 Residuals-by-PredictedPlot...219

More information

Heteroscedasticity-Consistent Standard Error Estimates for the Linear Regression Model: SPSS and SAS Implementation. Andrew F.

Heteroscedasticity-Consistent Standard Error Estimates for the Linear Regression Model: SPSS and SAS Implementation. Andrew F. Heteroscedasticity-Consistent Standard Error Estimates for the Linear Regression Model: SPSS and SAS Implementation Andrew F. Hayes 1 The Ohio State University Columbus, Ohio hayes.338@osu.edu Draft: January

More information

Subset Selection in Multiple Regression

Subset Selection in Multiple Regression Chapter 307 Subset Selection in Multiple Regression Introduction Multiple regression analysis is documented in Chapter 305 Multiple Regression, so that information will not be repeated here. Refer to that

More information

Bivariate (Simple) Regression Analysis

Bivariate (Simple) Regression Analysis Revised July 2018 Bivariate (Simple) Regression Analysis This set of notes shows how to use Stata to estimate a simple (two-variable) regression equation. It assumes that you have set Stata up on your

More information

Week 11: Interpretation plus

Week 11: Interpretation plus Week 11: Interpretation plus Marcelo Coca Perraillon University of Colorado Anschutz Medical Campus Health Services Research Methods I HSMP 7607 2017 c 2017 PERRAILLON ARR 1 Outline A bit of a patchwork

More information

STATISTICS (STAT) Statistics (STAT) 1

STATISTICS (STAT) Statistics (STAT) 1 Statistics (STAT) 1 STATISTICS (STAT) STAT 2013 Elementary Statistics (A) Prerequisites: MATH 1483 or MATH 1513, each with a grade of "C" or better; or an acceptable placement score (see placement.okstate.edu).

More information

Standard Errors in OLS Luke Sonnet

Standard Errors in OLS Luke Sonnet Standard Errors in OLS Luke Sonnet Contents Variance-Covariance of ˆβ 1 Standard Estimation (Spherical Errors) 2 Robust Estimation (Heteroskedasticity Constistent Errors) 4 Cluster Robust Estimation 7

More information

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes.

Resources for statistical assistance. Quantitative covariates and regression analysis. Methods for predicting continuous outcomes. Resources for statistical assistance Quantitative covariates and regression analysis Carolyn Taylor Applied Statistics and Data Science Group (ASDa) Department of Statistics, UBC January 24, 2017 Department

More information

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions) THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533 MIDTERM EXAMINATION: October 14, 2005 Instructor: Val LeMay Time: 50 minutes 40 Marks FRST 430 50 Marks FRST 533 (extra questions) This examination

More information

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010

THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL. STOR 455 Midterm 1 September 28, 2010 THIS IS NOT REPRESNTATIVE OF CURRENT CLASS MATERIAL STOR 455 Midterm September 8, INSTRUCTIONS: BOTH THE EXAM AND THE BUBBLE SHEET WILL BE COLLECTED. YOU MUST PRINT YOUR NAME AND SIGN THE HONOR PLEDGE

More information

Binary Diagnostic Tests Clustered Samples

Binary Diagnostic Tests Clustered Samples Chapter 538 Binary Diagnostic Tests Clustered Samples Introduction A cluster randomization trial occurs when whole groups or clusters of individuals are treated together. In the twogroup case, each cluster

More information

Bland-Altman Plot and Analysis

Bland-Altman Plot and Analysis Chapter 04 Bland-Altman Plot and Analysis Introduction The Bland-Altman (mean-difference or limits of agreement) plot and analysis is used to compare two measurements of the same variable. That is, it

More information

Notes for Student Version of Soritec

Notes for Student Version of Soritec Notes for Student Version of Soritec Department of Economics January 20, 2001 INSTRUCTIONS FOR USING SORITEC This is a brief introduction to the use of the student version of the Soritec statistical/econometric

More information

The cointardl addon for gretl

The cointardl addon for gretl The cointardl addon for gretl Artur Tarassow Version 0.51 Changelog Version 0.51 (May, 2017) correction: following the literature, the wild bootstrap does not rely on resampled residuals but the initially

More information

Complexity Challenges to the Discovery of Relationships in Eddy Current Non-destructive Test Data

Complexity Challenges to the Discovery of Relationships in Eddy Current Non-destructive Test Data Complexity Challenges to the Discovery of Relationships in Eddy Current Non-destructive Test Data CPT John R. Brence United States Military Academy Donald E. Brown, PhD University of Virginia Outline Background

More information

Introduction to Mixed Models: Multivariate Regression

Introduction to Mixed Models: Multivariate Regression Introduction to Mixed Models: Multivariate Regression EPSY 905: Multivariate Analysis Spring 2016 Lecture #9 March 30, 2016 EPSY 905: Multivariate Regression via Path Analysis Today s Lecture Multivariate

More information

Health Disparities (HD): It s just about comparing two groups

Health Disparities (HD): It s just about comparing two groups A review of modern methods of estimating the size of health disparities May 24, 2017 Emil Coman 1 Helen Wu 2 1 UConn Health Disparities Institute, 2 UConn Health Modern Modeling conference, May 22-24,

More information

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM

Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM Frequently Asked Questions Updated 2006 (TRIM version 3.51) PREPARING DATA & RUNNING TRIM * Which directories are used for input files and output files? See menu-item "Options" and page 22 in the manual.

More information

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE PLS 802 Spring 2018 Professor Jacoby THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE This handout shows the log of a Stata session

More information

Statistical Matching using Fractional Imputation

Statistical Matching using Fractional Imputation Statistical Matching using Fractional Imputation Jae-Kwang Kim 1 Iowa State University 1 Joint work with Emily Berg and Taesung Park 1 Introduction 2 Classical Approaches 3 Proposed method 4 Application:

More information

Analysis of Complex Survey Data with SAS

Analysis of Complex Survey Data with SAS ABSTRACT Analysis of Complex Survey Data with SAS Christine R. Wells, Ph.D., UCLA, Los Angeles, CA The differences between data collected via a complex sampling design and data collected via other methods

More information

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016

CPSC 340: Machine Learning and Data Mining. Feature Selection Fall 2016 CPSC 34: Machine Learning and Data Mining Feature Selection Fall 26 Assignment 3: Admin Solutions will be posted after class Wednesday. Extra office hours Thursday: :3-2 and 4:3-6 in X836. Midterm Friday:

More information

Missing Data Missing Data Methods in ML Multiple Imputation

Missing Data Missing Data Methods in ML Multiple Imputation Missing Data Missing Data Methods in ML Multiple Imputation PRE 905: Multivariate Analysis Lecture 11: April 22, 2014 PRE 905: Lecture 11 Missing Data Methods Today s Lecture The basics of missing data:

More information

Statistical Analysis of List Experiments

Statistical Analysis of List Experiments Statistical Analysis of List Experiments Kosuke Imai Princeton University Joint work with Graeme Blair October 29, 2010 Blair and Imai (Princeton) List Experiments NJIT (Mathematics) 1 / 26 Motivation

More information

Departments of Economics and Agricultural and Applied Economics Ph.D. Written Qualifying Examination August 2010 will not required

Departments of Economics and Agricultural and Applied Economics Ph.D. Written Qualifying Examination August 2010 will not required Departments of Economics and Agricultural and Applied Economics Ph.D. Written Qualifying Examination August 2010 Purpose All Ph.D. students are required to take the written Qualifying Examination. The

More information

Data Analysis Guidelines

Data Analysis Guidelines Data Analysis Guidelines DESCRIPTIVE STATISTICS Standard Deviation Standard deviation is a calculated value that describes the variation (or spread) of values in a data set. It is calculated using a formula

More information

Time-Varying Volatility and ARCH Models

Time-Varying Volatility and ARCH Models Time-Varying Volatility and ARCH Models ARCH MODEL AND TIME-VARYING VOLATILITY In this lesson we'll use Gretl to estimate several models in which the variance of the dependent variable changes over time.

More information

Applied Regression Modeling: A Business Approach

Applied Regression Modeling: A Business Approach i Applied Regression Modeling: A Business Approach Computer software help: SPSS SPSS (originally Statistical Package for the Social Sciences ) is a commercial statistical software package with an easy-to-use

More information

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects

Statistics & Analysis. A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects A Comparison of PDLREG and GAM Procedures in Measuring Dynamic Effects Patralekha Bhattacharya Thinkalytics The PDLREG procedure in SAS is used to fit a finite distributed lagged model to time series data

More information

MODULE THREE, PART FOUR: PANEL DATA ANALYSIS IN ECONOMIC EDUCATION RESEARCH USING SAS

MODULE THREE, PART FOUR: PANEL DATA ANALYSIS IN ECONOMIC EDUCATION RESEARCH USING SAS MODULE THREE, PART FOUR: PANEL DATA ANALYSIS IN ECONOMIC EDUCATION RESEARCH USING SAS Part Four of Module Three provides a cookbook-type demonstration of the steps required to use SAS in panel data analysis.

More information

CSC 328/428 Summer Session I 2002 Data Analysis for the Experimenter FINAL EXAM

CSC 328/428 Summer Session I 2002 Data Analysis for the Experimenter FINAL EXAM options pagesize=53 linesize=76 pageno=1 nodate; proc format; value $stcktyp "1"="Growth" "2"="Combined" "3"="Income"; data invstmnt; input stcktyp $ perform; label stkctyp="type of Stock" perform="overall

More information

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data

Spatial Patterns Point Pattern Analysis Geographic Patterns in Areal Data Spatial Patterns We will examine methods that are used to analyze patterns in two sorts of spatial data: Point Pattern Analysis - These methods concern themselves with the location information associated

More information

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors (Section 5.4) What? Consequences of homoskedasticity Implication for computing standard errors What do these two terms

More information

Predicting Web Service Levels During VM Live Migrations

Predicting Web Service Levels During VM Live Migrations Predicting Web Service Levels During VM Live Migrations 5th International DMTF Academic Alliance Workshop on Systems and Virtualization Management: Standards and the Cloud Helmut Hlavacs, Thomas Treutner

More information

Econometrics I: OLS. Dean Fantazzini. Dipartimento di Economia Politica e Metodi Quantitativi. University of Pavia

Econometrics I: OLS. Dean Fantazzini. Dipartimento di Economia Politica e Metodi Quantitativi. University of Pavia Dipartimento di Economia Politica e Metodi Quantitativi University of Pavia Overview of the Lecture 1 st EViews Session I: Convergence in the Solow Model 2 Overview of the Lecture 1 st EViews Session I:

More information

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown Z-TEST / Z-STATISTIC: used to test hypotheses about µ when the population standard deviation is known and population distribution is normal or sample size is large T-TEST / T-STATISTIC: used to test hypotheses

More information

set mem 10m we can also decide to have the more separation line on the screen or not when the software displays results: set more on set more off

set mem 10m we can also decide to have the more separation line on the screen or not when the software displays results: set more on set more off Setting up Stata We are going to allocate 10 megabites to the dataset. You do not want to allocate to much memory to the dataset because the more memory you allocate to the dataset, the less memory will

More information

Estimation of Unknown Parameters in Dynamic Models Using the Method of Simulated Moments (MSM)

Estimation of Unknown Parameters in Dynamic Models Using the Method of Simulated Moments (MSM) Estimation of Unknown Parameters in ynamic Models Using the Method of Simulated Moments (MSM) Abstract: We introduce the Method of Simulated Moments (MSM) for estimating unknown parameters in dynamic models.

More information

OLS Assumptions and Goodness of Fit

OLS Assumptions and Goodness of Fit OLS Assumptions and Goodness of Fit A little warm-up Assume I am a poor free-throw shooter. To win a contest I can choose to attempt one of the two following challenges: A. Make three out of four free

More information

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016

Resampling Methods. Levi Waldron, CUNY School of Public Health. July 13, 2016 Resampling Methods Levi Waldron, CUNY School of Public Health July 13, 2016 Outline and introduction Objectives: prediction or inference? Cross-validation Bootstrap Permutation Test Monte Carlo Simulation

More information

Introduction to Mplus

Introduction to Mplus Introduction to Mplus May 12, 2010 SPONSORED BY: Research Data Centre Population and Life Course Studies PLCS Interdisciplinary Development Initiative Piotr Wilk piotr.wilk@schulich.uwo.ca OVERVIEW Mplus

More information

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1

Big Data Methods. Chapter 5: Machine learning. Big Data Methods, Chapter 5, Slide 1 Big Data Methods Chapter 5: Machine learning Big Data Methods, Chapter 5, Slide 1 5.1 Introduction to machine learning What is machine learning? Concerned with the study and development of algorithms that

More information

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health

Multiple Imputation for Missing Data. Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Multiple Imputation for Missing Data Benjamin Cooper, MPH Public Health Data & Training Center Institute for Public Health Outline Missing data mechanisms What is Multiple Imputation? Software Options

More information

A Statistical Analysis of UK Financial Networks

A Statistical Analysis of UK Financial Networks A Statistical Analysis of UK Financial Networks J. Chu & S. Nadarajah First version: 31 December 2016 Research Report No. 9, 2016, Probability and Statistics Group School of Mathematics, The University

More information

Introduction to Eviews Updated by Jianguo Wang CSSCR September 2009

Introduction to Eviews Updated by Jianguo Wang CSSCR September 2009 What is EViews? EViews is a software package that provides tools for data analysis, regression, and forecasting. It is often seen as a canned regression package. EViews has an object oriented design. Information

More information

Package MARX. June 16, 2017

Package MARX. June 16, 2017 Package MARX June 16, 2017 Title Simulation, Estimation and Selection of MARX Models Version 0.1 Date 2017-06-16 Author [aut, cre, cph], Alain Hecq [ctb], Lenard Lieb [ctb] Maintainer

More information

Stat 5100 Handout #6 SAS: Linear Regression Remedial Measures

Stat 5100 Handout #6 SAS: Linear Regression Remedial Measures Stat 5100 Handout #6 SAS: Linear Regression Remedial Measures Example: Age and plasma level for 25 healthy children in a study are reported. Of interest is how plasma level depends on age. (Text Table

More information

CSE 417T: Introduction to Machine Learning. Lecture 6: Bias-Variance Trade-off. Henry Chai 09/13/18

CSE 417T: Introduction to Machine Learning. Lecture 6: Bias-Variance Trade-off. Henry Chai 09/13/18 CSE 417T: Introduction to Machine Learning Lecture 6: Bias-Variance Trade-off Henry Chai 09/13/18 Let! ", $ = the maximum number of dichotomies on " points s.t. no subset of $ points is shattered Recall

More information

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS

Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS ABSTRACT Paper 1938-2018 Frequencies, Unequal Variance Weights, and Sampling Weights: Similarities and Differences in SAS Robert M. Lucas, Robert M. Lucas Consulting, Fort Collins, CO, USA There is confusion

More information