PAM 4280/ECON 3710: The Economics of Risky Health Behaviors Fall 2015 Professor John Cawley TA Christine Coyer Stata Basics for PAM 4280/ECON 3710 I Introduction Stata is one of the most commonly used statistical software packages among economists because of its breadth of estimation commands and quality graphics You will use Stata for all homework assignments and your research paper Stata/MP (Version 14) is available from CISER; you can connect to the CISER research servers from any computer (http://cisercornelledu/computing/manual/connectshtm) In addition, computers with Stata are available in the following Mann Library computer labs: Stone, B30A and B30B In these notes, I summarize the features of Stata and I provide examples of the most commonly used commands Additionally, I discuss the National Longitudinal Study of Adolescent to Adult Health (Add Health) and the commands you will need to use to correctly analyze these survey data Stata syntax is highlighted in blue and examples are shown in text boxes II Stata interface The Stata interface includes separate panels for writing commands, reporting output, reviewing previous commands, and listing properties of the data in memory In addition to the primary panels, you should also use the help, do- file, and data buttons to initiate secondary panels with the help menu, do- file editor, and data browser, respectively 1
You can interact with Stata directly by typing commands into the command line However, you should always save your commands as a do file for future use and replication A do file is simply a text file with the commands you wish to execute listed sequentially 2
You can execute a do file directly from the do- file editor You also can execute do files outside of the Stata interface by typing the following command into your command prompt or terminal stata b do filename & III Stata files Below is a list of the different types of Stata files You will need to open Stata data files (dta), write do files, and save the log of your output (log) do Stata commands saved in text file dta Stata data file log Log file with output from Stata commands gph Graph file ado Program (every Stata command has an associated ado file) Logs By default, Stata saves log files in the Stata Markup and Control Language (SMCL) format The SMCL format is viewed correctly from the do- file editor only Therefore, I recommend that you change the format of your log file to plain text Plain text log files can be saved with either the log or txt suffix set logtype text log using filename, replace log close log using "PAM4280_Fall2015_Logtxt", text replace log close IV Basic commands These basic Stata commands are useful at the beginning of your do file clear all clear everything from memory set more {on off} pause until any key is pressed (on or off) #delimit {cr ;} change the delimiter from carriage return to semi- colon The trace command will trace the execution of programs (ado) This command is best used for troubleshooting errors, but it creates lots output, most of which is unnecessary set trace {on off} trace execution of programs (on or off) 3
The exit command (exit) terminates the current process Include the exit command at the end of your do file to return to the command line in the primary panel You also can use the exit command to terminate a process within a loop, for example There are two ways to include comments in your do file First, you can start a new line with either an asterisk (*) or two forward slashes (//) Second, you can write a comment between /* and */; this format allows you to wrap comments over multiple lines clear all set more off * Example of a comment in a do file /* Example of a multi-line comment in a do file */ The help command launches a viewer to display the Stata help page You can also launch the help page for a specific command (help command) Each help page lists the command syntax, description, options, and examples Most Stata help pages are also available online Google Stata help command For example, Stata help use yields the following URL (http://wwwstatacom/helpcgi?use) 4
V Data Loading data The data posted on Blackboard for your homework assignments and research paper is in Stata format (PAM4280_Fall2015_Datadta) The use command loads a Stata- formatted data file (dta) into memory use filename, clear The data file should be in your current directory or you will need to include the file path with the filename You can check your current directory by typing pwd (present working directory) in the command window You also can change your current directory by typing cd directory_path cd "\\rschfs1x\userrs\cnc45_rs\documents\pam4280"; use "PAM4280_PS_Datadta", clear; Merging data You may wish to add additional variables or waves of data to the Add Health file for your research paper To merge the Wave IV Add Health data with other waves of data, use the one- to- one merge command merge 1:1 [varlist] using filename [, keepusing(varlist)] The merge match results are saved in the _merge system variable You can either assert a required match as a command option or you can use the keep command to save the desired match results [, assert(results)] 1 master Appeared in master only 2 using Appeared in using only 3 match Appeared in both merge 1:1 AID using "AddHealth_WaveIII_testsdta", keepusing(pvtstd3c); keep if inlist(_merge,1,3); drop _merge; Generating new variables You will need to generate new variables in the homework assignments Use the generate and replace commands to create new variables gen newvar = exp replace oldvar = exp [if] 5
gen male = BIO_SEX4 == 1; gen blackwhite = ; replace blackwhite = 0 if white == 1; replace blackwhite = 1 if black == 1; VI Add Health data For all homework assignments and your research paper, you will use publicly available data from Wave IV of the National Longitudinal Study of Adolescent to Adult Health (Add Health) Add Health is a longitudinal survey administered by the University of North Carolina at Chapel Hill, and it is designed for the study of the health and behaviors of American adolescents and young adults Detailed information about the data is available on the Add Health website: http://wwwcpcuncedu/projects/addhealth 6
The Add Health data are based on a stratified random sample of all high schools in the US The in- home sample comprises a main sample of adolescents from each community as well as selected oversamples Declare survey data Your estimates would be biased if you did not account for the sample design of the Add Health study Stata has a well- established set of commands that will automatically adjust your estimates after you declare the survey design To declare the survey design, you must give Stata the primary sampling units (aka, clusters) and the probability weights svyset [psu] [weight] [, options] svyset CLUSTER2 [pweight= GSWGT4_2]; svydescribe; svydescribe; Survey: Describing stage 1 sampling units pweight: GSWGT4_2 VCE: linearized Single unit: missing 7
Strata 1: <one> SU 1: CLUSTER2 FPC 1: <zero> #Obs per Unit ---------------------------- Stratum #Units #Obs min mean max -------- -------- -------- -------- -------- -------- 1 132 5,114 5 387 106 -------- -------- -------- -------- -------- -------- 1 132 5,114 5 387 106 VII Describe data Before you begin analyzing the Add Health data, you should use these commands to review the contents of the data file desc [varlist] describe the data in memory mdesc varlist describe missing values summ [varlist] summary statistics tab varname One- way table of frequencies tab varname1 varname2 Two- way table of frequencies VIII Analyze data The homework assignments require that you estimate and interpret means, linear regressions, and probit models mean varlist [, over(varlist)] reg depvar [indepvar] probit depvar [indepvar] Recall that a probit model fits the following Pr y! 0 x! = Φ(x! β) where Φ is the cumulative distribution function of the standard normal distribution Stata stores some results in memory after a command is executed (eg, vector of means) Two of the most commonly used stored results are the coefficient and the standard error (_b[varname] and _se[varname]) Use the following postestimation commands to review the complete list of stored results return list return stored results ereturn list return estimation results Survey data To adjust your estimated means and coefficients for the sample design of the Add Health study, type your Stata command after the svy prefix Review the Stata Survey Data Reference Manual for a complete list of the available survey commands 8
svy: command Many of the estimates that you will be asked to produce in the homework assignments are among a subsample of the Add Health survey respondents For example, you will be asked about condom use among young adults who have ever had sex The correct way to estimate results for a subsample of observations is to use the subpopulation option (subpop) with the svy prefix Other methods (eg, if statements) will incorrectly identify the number of primary sampling units and therefore produce biased variance calculations svy, subpop([varname]): command svy, subpop(sex_12m): mean condom_12m, over(gender); svy, subpop(sex_12m): mean condom_12m, over(gender); (running mean on estimation sample) Survey: Mean estimation Number of strata = 1 Number of obs = 4,901 Number of PSUs = 132 Population size = 21,071,729 Subpop no obs = 4,362 Subpop size = 18,690,865 Design df = 131 Male: gender = Male Female: gender = Female -------------------------------------------------------------- Linearized Over Mean Std Err [95% Conf Interval] -------------+------------------------------------------------ condom_12m Male 6003872 0143038 5720908 6286835 Female 4687042 0134242 442148 4952605 -------------------------------------------------------------- IX Postestimation You will use postestimation commands to construct estimates of marginal effects and elasticities, as well as hypothesis testing Use the margins command to estimate marginal effects and elasticities The marginal effect of an independent variable is the derivative of the probability function (ie, the derivative of Pr y! 0 x! = Φ(x! β) with respect to x) By default, Stata evaluates the margins command for each observation and reports the average Use the vce(unconditional) option to tell Stata to allow for the sampling of covariates; this option is required to correctly estimate standard errors from survey data margins, vce(unconditional) dydx(varlist) margins, vce(unconditional) eyex(varlist) atmeans 9
The standard postestimation command for t- tests (ttest) is not available after a svy command Use the lincom (linear combination) command to calculate a t- test for the null hypothesis that two coefficients are equal (eg, H! : β!"#$ β!"#$%" = 0) lincom _b[varname1] - _b[varname2] X Save results There are a number of different methods for saving your results to Word or Excel The homework assignment solutions will utilize both outreg and putexcel outreg [varlist] using filename outreg using "PAM4280_Fall2015_Tablesdoc", se replace title("table title") ctitle("",ctitle1); outreg using "PAM4280_Fall2015_Tablesdoc", se merge title("table title") ctitle("",ctitle1); putexcel set filename, modify [, keepcellformat] putexcel cellexplist putexcel set PAM4280_Fall2015_PSxlsx, sheet( PS ) keepcellformat modify; putexcel A1 = matrix(_b[male],_se[male]); 10
References Analyzing data from sample surveys UNC: Carolina Population Center, http://wwwcpcuncedu/research/tools/data_analysis/statatutorial/sample_surveys/indexhtml Carmeron, C A, & Triveda, P K (2010) Microeconomics Using Stata, Revised Edition StataCorp LP College Station, Texas Gentzkow, M, & Shapiro, J M (2014) Code and Data for the Social Sciences: A Practitioner s Guide University of Chicago mimeo, http://facultychicagoboothedu/matthewgentzkow/research/codeanddatapdf Introduction to Stata UNC: Carolina Population Center, http://wwwcpcuncedu/research/tools/data_analysis/statatutorial Long, J S (2009) The Workflow of Data Analysis Using Stata StataCorp LP College Station, Texas The National Longitudinal Study of Adolescent to Adult Health (Add Health) UNC: Carolina Population Center, http://wwwcpcuncedu/projects/addhealth Resources to help you learn and use Stata UCLA: Statistical Consulting Group, http://wwwatsuclaedu/stat/stata/ Rodriguez, G (2015) Stata Tutorial Princeton University, http://dataprincetonedu/stata/ StataCorp (2013) Stata Survey Data Reference Manual: Release 13 Statistical Software College Station, TX: StataCorp LP http://wwwstatacom/manuals13/svypdf 11