PAM 4280/ECON 3710: The Economics of Risky Health Behaviors Fall 2015 Professor John Cawley TA Christine Coyer. Stata Basics for PAM 4280/ECON 3710

Similar documents
Title. Syntax. svy: tabulate oneway One-way tables for survey data. Basic syntax. svy: tabulate varname. Full syntax

GETTING DATA INTO THE PROGRAM

STATA TUTORIAL B. Rabin with modifications by T. Marsh

Empirical Asset Pricing

Bivariate (Simple) Regression Analysis

Correctly Compute Complex Samples Statistics

Introduction to Stata First Session. I- Launching and Exiting Stata Launching Stata Exiting Stata..

A Short Guide to Stata 10 for Windows

A quick introduction to STATA

Introduction to Stata - Session 2

A Quick Guide to Stata 8 for Windows

Soci Statistics for Sociologists

Introduction to STATA

Dr. Barbara Morgan Quantitative Methods

STATA Tutorial. Introduction to Econometrics. by James H. Stock and Mark W. Watson. to Accompany

An Introduction to Stata Part II: Data Analysis

HILDA PROJECT TECHNICAL PAPER SERIES No. 2/08, February 2008

Introduction to Stata: An In-class Tutorial

Introduction to Stata Session 3

I Launching and Exiting Stata. Stata will ask you if you would like to check for updates. Update now or later, your choice.

An Introduction to Stata Part I: Data Management

Instrumental variables, bootstrapping, and generalized linear models

OVERVIEW OF WINDOWS IN STATA

Multiple-imputation analysis using Stata s mi command

Getting started with Stata 2017: Cheat-sheet

ECONOMICS 351* -- Stata 10 Tutorial 1. Stata 10 Tutorial 1

Basic Stata Tutorial

ECON Stata course, 3rd session

Introduction to Stata Toy Program #1 Basic Descriptives

A QUICK INTRODUCTION TO STATA

Week 4: Simple Linear Regression II

Econ Stata Tutorial I: Reading, Organizing and Describing Data. Sanjaya DeSilva

Revision of Stata basics in STATA 11:

GETTING STARTED WITH STATA. Sébastien Fontenay ECON - IRES

Department of Economics Spring 2018 University of California Economics 154 Professor Martha Olney Stata Lesson Thursday February 15, 2018

Introduction to Stata. Getting Started. This is the simple command syntax in Stata and more conditions can be added as shown in the examples.

An Introductory Guide to Stata

STATA May 1996 BULLETIN A publication to promote communication among Stata users

Appendix II: STATA Preliminary

Economics 145 Fall 2009 Howell Getting Started with Stata

A quick introduction to STATA:

/23/2004 TA : Jiyoon Kim. Recitation Note 1

Principles of Biostatistics and Data Analysis PHP 2510 Lab2

Correctly Compute Complex Samples Statistics

Stata v 12 Illustration. First Session

Acknowledgments. Acronyms

Review of Stata II AERC Training Workshop Nairobi, May 2002

Data Management 2. 1 Introduction. 2 Do-files. 2.1 Ado-files and Do-files

Stata 10/11 Tutorial 1

Lab 1: Basics of Stata Short Course on Poverty & Development for Nordic Ph.D. Students University of Copenhagen June 13-23, 2000

Week 10: Heteroskedasticity II

Analysis of Complex Survey Data with SAS

STATA Hand Out 1. STATA's latest version is version 12. Most commands in this hand-out work on all versions of STATA.

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

ICSSR Data Service. Stata: User Guide. Indian Council of Social Science Research. Indian Social Science Data Repository

Introduction to Stata. Written by Yi-Chi Chen

piecewise ginireg 1 Piecewise Gini Regressions in Stata Jan Ditzen 1 Shlomo Yitzhaki 2 September 8, 2017

The Stata Bible 2.0. Table of Contents. by Dawn L. Teele 1

Workshop for empirical trade analysis. December 2015 Bangkok, Thailand

IN-CLASS EXERCISE: INTRODUCTION TO THE R "SURVEY" PACKAGE

Panel Data 4: Fixed Effects vs Random Effects Models

Introduction to Stata - Session 1

Stata: A Brief Introduction Biostatistics

STATA November 2000 BULLETIN ApublicationtopromotecommunicationamongStatausers

ADePT: Labor. Technical User s Guide. Version 1.0. Automated analysis of the labor market conditions in low- and middle-income countries

Stata Session 2. Tarjei Havnes. University of Oslo. Statistics Norway. ECON 4136, UiO, 2012

ECONOMICS 452* -- Stata 12 Tutorial 1. Stata 12 Tutorial 1. TOPIC: Getting Started with Stata: An Introduction or Review

Empirical trade analysis

Stata version 13. First Session. January I- Launching and Exiting Stata Launching Stata Exiting Stata..

Introduction to data analysis using STATA. Miguel Niño-Zarazúa World Institute for Development Economics Research United Nations University

You will learn: The structure of the Stata interface How to open files in Stata How to modify variable and value labels How to manipulate variables

ECON Introductory Econometrics Seminar 4

Birkbeck College Department of Economics, Mathematics and Statistics.

An Introduction to Stata Exercise 1

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT

ECONOMICS 452 TIME SERIES WITH STATA

International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata

Introduction to STATA 6.0 ECONOMICS 626

STAT10010 Introductory Statistics Lab 2

tabulate varname [aw=weightvar]

Appendix II: STATA Preliminary

A Short Introduction to STATA

Cross-Sectional Analysis

Department of Economics Spring 2016 University of California Economics 154 Professor Martha Olney Stata Lesson Wednesday February 17, 2016

3.6 Sample code: yrbs_data <- read.spss("yrbs07.sav",to.data.frame=true)

Can double click the data file and it should open STATA

Stata versions 12 & 13 Week 4 Practice Problems

WORKFLOW. Effective Data Management Strategies for Doing Research Well

Further processing of estimation results: Basic programming with matrices

Results Based Financing for Health Impact Evaluation Workshop Tunis, Tunisia October Stata 2. Willa Friedman

Data-Analysis Exercise Fitting and Extending the Discrete-Time Survival Analysis Model (ALDA, Chapters 11 & 12, pp )

Week 11: Interpretation plus

Introduction to Computing for Sociologists Neustadtl

Computing Optimal Strata Bounds Using Dynamic Programming

Survey Questions and Methodology

Data analysis using Stata , AMSE Master (M1), Spring semester

Week 1: Introduction to Stata

Introduction to Stata

A quick introduction to STATA:

Advanced Stata Skills

Transcription:

PAM 4280/ECON 3710: The Economics of Risky Health Behaviors Fall 2015 Professor John Cawley TA Christine Coyer Stata Basics for PAM 4280/ECON 3710 I Introduction Stata is one of the most commonly used statistical software packages among economists because of its breadth of estimation commands and quality graphics You will use Stata for all homework assignments and your research paper Stata/MP (Version 14) is available from CISER; you can connect to the CISER research servers from any computer (http://cisercornelledu/computing/manual/connectshtm) In addition, computers with Stata are available in the following Mann Library computer labs: Stone, B30A and B30B In these notes, I summarize the features of Stata and I provide examples of the most commonly used commands Additionally, I discuss the National Longitudinal Study of Adolescent to Adult Health (Add Health) and the commands you will need to use to correctly analyze these survey data Stata syntax is highlighted in blue and examples are shown in text boxes II Stata interface The Stata interface includes separate panels for writing commands, reporting output, reviewing previous commands, and listing properties of the data in memory In addition to the primary panels, you should also use the help, do- file, and data buttons to initiate secondary panels with the help menu, do- file editor, and data browser, respectively 1

You can interact with Stata directly by typing commands into the command line However, you should always save your commands as a do file for future use and replication A do file is simply a text file with the commands you wish to execute listed sequentially 2

You can execute a do file directly from the do- file editor You also can execute do files outside of the Stata interface by typing the following command into your command prompt or terminal stata b do filename & III Stata files Below is a list of the different types of Stata files You will need to open Stata data files (dta), write do files, and save the log of your output (log) do Stata commands saved in text file dta Stata data file log Log file with output from Stata commands gph Graph file ado Program (every Stata command has an associated ado file) Logs By default, Stata saves log files in the Stata Markup and Control Language (SMCL) format The SMCL format is viewed correctly from the do- file editor only Therefore, I recommend that you change the format of your log file to plain text Plain text log files can be saved with either the log or txt suffix set logtype text log using filename, replace log close log using "PAM4280_Fall2015_Logtxt", text replace log close IV Basic commands These basic Stata commands are useful at the beginning of your do file clear all clear everything from memory set more {on off} pause until any key is pressed (on or off) #delimit {cr ;} change the delimiter from carriage return to semi- colon The trace command will trace the execution of programs (ado) This command is best used for troubleshooting errors, but it creates lots output, most of which is unnecessary set trace {on off} trace execution of programs (on or off) 3

The exit command (exit) terminates the current process Include the exit command at the end of your do file to return to the command line in the primary panel You also can use the exit command to terminate a process within a loop, for example There are two ways to include comments in your do file First, you can start a new line with either an asterisk (*) or two forward slashes (//) Second, you can write a comment between /* and */; this format allows you to wrap comments over multiple lines clear all set more off * Example of a comment in a do file /* Example of a multi-line comment in a do file */ The help command launches a viewer to display the Stata help page You can also launch the help page for a specific command (help command) Each help page lists the command syntax, description, options, and examples Most Stata help pages are also available online Google Stata help command For example, Stata help use yields the following URL (http://wwwstatacom/helpcgi?use) 4

V Data Loading data The data posted on Blackboard for your homework assignments and research paper is in Stata format (PAM4280_Fall2015_Datadta) The use command loads a Stata- formatted data file (dta) into memory use filename, clear The data file should be in your current directory or you will need to include the file path with the filename You can check your current directory by typing pwd (present working directory) in the command window You also can change your current directory by typing cd directory_path cd "\\rschfs1x\userrs\cnc45_rs\documents\pam4280"; use "PAM4280_PS_Datadta", clear; Merging data You may wish to add additional variables or waves of data to the Add Health file for your research paper To merge the Wave IV Add Health data with other waves of data, use the one- to- one merge command merge 1:1 [varlist] using filename [, keepusing(varlist)] The merge match results are saved in the _merge system variable You can either assert a required match as a command option or you can use the keep command to save the desired match results [, assert(results)] 1 master Appeared in master only 2 using Appeared in using only 3 match Appeared in both merge 1:1 AID using "AddHealth_WaveIII_testsdta", keepusing(pvtstd3c); keep if inlist(_merge,1,3); drop _merge; Generating new variables You will need to generate new variables in the homework assignments Use the generate and replace commands to create new variables gen newvar = exp replace oldvar = exp [if] 5

gen male = BIO_SEX4 == 1; gen blackwhite = ; replace blackwhite = 0 if white == 1; replace blackwhite = 1 if black == 1; VI Add Health data For all homework assignments and your research paper, you will use publicly available data from Wave IV of the National Longitudinal Study of Adolescent to Adult Health (Add Health) Add Health is a longitudinal survey administered by the University of North Carolina at Chapel Hill, and it is designed for the study of the health and behaviors of American adolescents and young adults Detailed information about the data is available on the Add Health website: http://wwwcpcuncedu/projects/addhealth 6

The Add Health data are based on a stratified random sample of all high schools in the US The in- home sample comprises a main sample of adolescents from each community as well as selected oversamples Declare survey data Your estimates would be biased if you did not account for the sample design of the Add Health study Stata has a well- established set of commands that will automatically adjust your estimates after you declare the survey design To declare the survey design, you must give Stata the primary sampling units (aka, clusters) and the probability weights svyset [psu] [weight] [, options] svyset CLUSTER2 [pweight= GSWGT4_2]; svydescribe; svydescribe; Survey: Describing stage 1 sampling units pweight: GSWGT4_2 VCE: linearized Single unit: missing 7

Strata 1: <one> SU 1: CLUSTER2 FPC 1: <zero> #Obs per Unit ---------------------------- Stratum #Units #Obs min mean max -------- -------- -------- -------- -------- -------- 1 132 5,114 5 387 106 -------- -------- -------- -------- -------- -------- 1 132 5,114 5 387 106 VII Describe data Before you begin analyzing the Add Health data, you should use these commands to review the contents of the data file desc [varlist] describe the data in memory mdesc varlist describe missing values summ [varlist] summary statistics tab varname One- way table of frequencies tab varname1 varname2 Two- way table of frequencies VIII Analyze data The homework assignments require that you estimate and interpret means, linear regressions, and probit models mean varlist [, over(varlist)] reg depvar [indepvar] probit depvar [indepvar] Recall that a probit model fits the following Pr y! 0 x! = Φ(x! β) where Φ is the cumulative distribution function of the standard normal distribution Stata stores some results in memory after a command is executed (eg, vector of means) Two of the most commonly used stored results are the coefficient and the standard error (_b[varname] and _se[varname]) Use the following postestimation commands to review the complete list of stored results return list return stored results ereturn list return estimation results Survey data To adjust your estimated means and coefficients for the sample design of the Add Health study, type your Stata command after the svy prefix Review the Stata Survey Data Reference Manual for a complete list of the available survey commands 8

svy: command Many of the estimates that you will be asked to produce in the homework assignments are among a subsample of the Add Health survey respondents For example, you will be asked about condom use among young adults who have ever had sex The correct way to estimate results for a subsample of observations is to use the subpopulation option (subpop) with the svy prefix Other methods (eg, if statements) will incorrectly identify the number of primary sampling units and therefore produce biased variance calculations svy, subpop([varname]): command svy, subpop(sex_12m): mean condom_12m, over(gender); svy, subpop(sex_12m): mean condom_12m, over(gender); (running mean on estimation sample) Survey: Mean estimation Number of strata = 1 Number of obs = 4,901 Number of PSUs = 132 Population size = 21,071,729 Subpop no obs = 4,362 Subpop size = 18,690,865 Design df = 131 Male: gender = Male Female: gender = Female -------------------------------------------------------------- Linearized Over Mean Std Err [95% Conf Interval] -------------+------------------------------------------------ condom_12m Male 6003872 0143038 5720908 6286835 Female 4687042 0134242 442148 4952605 -------------------------------------------------------------- IX Postestimation You will use postestimation commands to construct estimates of marginal effects and elasticities, as well as hypothesis testing Use the margins command to estimate marginal effects and elasticities The marginal effect of an independent variable is the derivative of the probability function (ie, the derivative of Pr y! 0 x! = Φ(x! β) with respect to x) By default, Stata evaluates the margins command for each observation and reports the average Use the vce(unconditional) option to tell Stata to allow for the sampling of covariates; this option is required to correctly estimate standard errors from survey data margins, vce(unconditional) dydx(varlist) margins, vce(unconditional) eyex(varlist) atmeans 9

The standard postestimation command for t- tests (ttest) is not available after a svy command Use the lincom (linear combination) command to calculate a t- test for the null hypothesis that two coefficients are equal (eg, H! : β!"#$ β!"#$%" = 0) lincom _b[varname1] - _b[varname2] X Save results There are a number of different methods for saving your results to Word or Excel The homework assignment solutions will utilize both outreg and putexcel outreg [varlist] using filename outreg using "PAM4280_Fall2015_Tablesdoc", se replace title("table title") ctitle("",ctitle1); outreg using "PAM4280_Fall2015_Tablesdoc", se merge title("table title") ctitle("",ctitle1); putexcel set filename, modify [, keepcellformat] putexcel cellexplist putexcel set PAM4280_Fall2015_PSxlsx, sheet( PS ) keepcellformat modify; putexcel A1 = matrix(_b[male],_se[male]); 10

References Analyzing data from sample surveys UNC: Carolina Population Center, http://wwwcpcuncedu/research/tools/data_analysis/statatutorial/sample_surveys/indexhtml Carmeron, C A, & Triveda, P K (2010) Microeconomics Using Stata, Revised Edition StataCorp LP College Station, Texas Gentzkow, M, & Shapiro, J M (2014) Code and Data for the Social Sciences: A Practitioner s Guide University of Chicago mimeo, http://facultychicagoboothedu/matthewgentzkow/research/codeanddatapdf Introduction to Stata UNC: Carolina Population Center, http://wwwcpcuncedu/research/tools/data_analysis/statatutorial Long, J S (2009) The Workflow of Data Analysis Using Stata StataCorp LP College Station, Texas The National Longitudinal Study of Adolescent to Adult Health (Add Health) UNC: Carolina Population Center, http://wwwcpcuncedu/projects/addhealth Resources to help you learn and use Stata UCLA: Statistical Consulting Group, http://wwwatsuclaedu/stat/stata/ Rodriguez, G (2015) Stata Tutorial Princeton University, http://dataprincetonedu/stata/ StataCorp (2013) Stata Survey Data Reference Manual: Release 13 Statistical Software College Station, TX: StataCorp LP http://wwwstatacom/manuals13/svypdf 11