Intermediate Stata Jeremy Craig Green 1 March 2011 3/29/2011 1
Advantages of Stata Ubiquitous in economics and political science Gaining popularity in health sciences Large library of add-on modules Version 11 markedly improves variables manager,.do file editor, factors, and merges I know that Stata does 2SLS right, I don't know if I trust the yahoo that coded it up for R. http://www.yale.edu/statlab 2
Disadvantages of Stata Proprietary (not open source) Expensive (especially for MP versions) Weaker graphics (compared to R) Command-line focused (has GUI, but difficult) Stata is great because its just like DOS. http://www.yale.edu/statlab 3
Getting Help help command in Stata http://statlab.stat.yale.edu/help/ http://www.ats.ucla.edu/stat/stata/ http://data.princeton.edu/stata/ http://www.stata.com/statalist/ 0. Introduction 4
1. Data management 2..do and.log files 3. Data inspection 4. Variable creation 5. Merging data 6. Reshaping data 7. Graphics 8. Regression analysis Today s Workshop 0. Introduction 5
Data Management Use StatTransfer software to convert Excel, SAS, SPSS, into Stata. Use compress command to make your dataset as small as possible and use less memory. Some very large datasets won t open in Stata due to memory limitations. In this case, it is recommended that you open a subset of the dataset: use varlist using filename 2. Data Management 6
.do files.do files allow you to run a whole program interactively; you can run it all at once or select portions of the program. AVOID making changes to your original data interactively using the STATA command window. Use DO files instead. To open.do file, use FILE menu or DO-file button. 1. Programming/Project Management Tips 7
.log files Syntax Begin log file log using filename.txt, text replace End log file log close 1. Programming/Project Management Tips 8
Data Inspection cd C:\Documents and Settings\Jeremy\My Documents\stata files\ clear set mem 80m log using mylog.txt, text replace sysuse census des sum varlist bro varlist 2. Data Management 9
Variable Creation g agesq = medage^2 /* creates variable equal to medage squared */ sum pop /* shows summary stats for pop */ scalar popmean = r(mean) /* saves mean of pop to scalar popmean */ /* create variable equal to 1 when pop > popmean and 0 otherwise */ g dummy = 0 replace dummy = 1 if pop > popmean /* how many states have population higher than average? */ count if dummy == 1 /* how many states NOT IN THE SOUTH have pop > popmean? */ count if dummy == 1 & region!= 3 2. Data Management 10
Variable Creation (con t) To create four dummies, we need to type those two commands four times. More importantly, the previous method generates 0s even when we have missing values. tab region, g(d) This second method tabulates the variable region, showing a list of the four regions, and correctly creates 4 separate dummies, accounting for missing values. 2. Data Management 11
Merging Data sysuse census, clear keep state-popurban sort state /* both master and using data must be sorted */ save census1, replace sysuse census, clear keep state medage-divorce /* note state is kept in both */ sort state save census2, replace use census1, clear merge 1:1: state using census2 /* remember: both files must be sorted */ tab _merge /* _merge keeps track of how good merge was */ 2. Data Management 12
sysuse bplong, clear br Reshaping Data Suppose we want to take difference in bp before and after treatment. Difficult to calculate difference if data is organized in long format. Need to convert to wide format. reshape wide bp, i(patient sex agegrp) j(when) br g bpdiff = bp2 bp1 2. Data Management 13
Draw a histogram: sysuse auto, clear histogram price Create a scatter plot: scatter price mpg Graphics Draw line of best fit (linear regression): twoway lfit price mpg Put two graphs together: twoway scatter price mpg lfit price mpg 3. Analyzing Data 14
Regression Analysis cor var1 var2 computes correlation btw two vars: cor price mpg regress var1 var2 estimates effect of var2 on var1: reg price mpg More complex models (i.e. discrete choice, IV, HLM) 3. Analyzing Data 15
Additional References http://www.yale.edu/statlab 16
Questions? 17