Workshop for empirical trade analysis December 2015 Bangkok, Thailand Cosimo Beverelli (WTO) Rainer Lanz (WTO)
Content a. Resources b. Stata windows c. Organization of the Bangkok_Dec_2015\Stata folder d. The directory_definition do file e. Datasets used in this introduction to Stata f. Do files g. Importing data into Stata h. Basic commands i. Merging datasets j. Macros k. Loops l. Graphics 2
a. Resources 1. Stata help and Stata manual 2. A variety of books covering Stata exist Web resources: 1. Germán Rodríguez s webpage Data management, graphics and programming 2. UCLA IDRES webpage Very comprehensive, covering all sorts of topics (data management, analysis, ) with several examples FAQ 3. Statalist Typically accessed via a google search 3
b. Stata windows 4
Type commands here
Type commands here List of variables and labels
List of variables and labels Type commands here Some properties of the variables
Actions taken in the session List of variables and labels Type commands here Some properties of the variables
c. Organization of the Bangkok_December_2015\Stata folder The Bangkok_December_2015\Stata folder contains the following subfolders: data do_files results Do not change the name of the folder or of the sub-folders The first thing to do is to define your own working directory (see slide d.) 6
d. The directory_definition do file To make things easy for all of us, we define a path for the working directory that can be easily changed, and will be changed only once and for all When you open Stata, the first thing to do is to open the directory_definition do-file in the do-file editor Click here: or Ctrl+9 The directory_definition looks like this and you have to change it to your own path 7
e. Datasets and do files used in this introduction to Stata To apply some of the Stata commands described in this presentation, we will use two datasets: WDI.csv a very small subset of the World Development indicators WB_ES.xls derived from the World Bank Enterprise Surveys You can find the datasets in the data directory: "$ZAF\data\Introduction_Stata" There are also 4 do files 01_data_intro_stata 02_descriptive 03_reshape_merge 04_loops You can find the do files in the directory: "$ZAF\do_files\Introduction_Stata 8
f. Do files A do file is a set of Stata commands typed in a plain text file When you work with STATA, always use do files e.g. one do file for creating your master dataset and one do file for regressions Do files can also be used to set globals and directories or to run a series of different do files after each other 9
Typical commands at beginning of each do file: clear all /* removes all data in the current Stata session */ set more off, perm /* prevents Stata to pause while running a do file */ capture log close /* closes a log file */ cd directory /* sets the directory, e.g.. $ZAF\data */ log using filename, replace /* useful for long do files */ use dataset.dta, replace /* open dataset in Stata format (.dta); are not needed */ 10
Typical commands at beginning of each do file: clear all /* removes all data in the current Stata session */ set more off, perm /* prevents Stata to pause while running a do file */ capture log close /* closes a log file */ cd directory /* sets the directory, e.g.. $ZAF\data */ log using filename, replace /* useful for long do files */ use dataset.dta, replace /* open dataset in Stata format (.dta); are not needed */ Notes * treats everything after it in a line as a comment /* text */ will make Stata treat text as a comment ( text can span over several lines) 10
g. Importing data into Stata insheet using filename.csv, clear Typically used for text files that are either comma or tab-separated import excel using filename.xls, sheet( Sheet1 ) first clear Reads excel files directly into Stata Allows to specify variables, cell range and worksheet to import Copy paste is strongly discouraged. Watch out. The accuracy of copied numbers depends on: How data are formatted in excel, i.e. how many digits are shown Your settings in Stata (use set type double before copying) To save the dataset (in Stata format): save filename, replace Exercise: execute the do file 01_data_into_stata.do 11
h. Basic commands Installing packages ssc install (or findit) packagename Identify missing values (represented by. (dot) or empty cell) inspect varlist (codebook varlist) Identify duplicate observations duplicates (report/drop/tag/list) varlist Identify number of unique values unique varlist (also, codebook for single variable) Browse the dataset browse varlist 12
Other basic commands describe generate destring/tostring replace rename Alternative: renvars keep drop The list can go on what is important is to keep in mind that, in case of doubt, you can always use the help : help command 13
List of useful operators commonly used in expressions Arithmetic Logical Relational + add! not (also ~) == equal - subtract or!= not equal (also ~=) * multiply & and < less than / divide <= less than or equal ^ raise to power > greater than + string concatenation >= greater than or equal 14
Commands for descriptive statistics summarize varlist tabulate var1 var 2 table rowvar (colvar), content() tabstat varlist, statistics() by() 15
The egen command Often used command to create new variables Commonly used egen functions (refer to WB_ES dataset): bysort cou sector: egen sales_sec=total(sales), missing bysort cou sector: egen sales_sec=mean(sales) egen exp_tot=rowtotal(exp_intermediate exp_final) egen id_cluster=group(cou sector) egen cou_sec=concat(cou sector) Further functions include: max, min, count, tag, 16
String functions generate newvar =function() Some useful functions are: abbrev() > shortens the string the number of indicated characters length() > returns the number of characters of the string subinstr() > allows to replace or delete particular substrings substr() > allows to extract substrings based on its position upper (lower) > Changes the entire string to uppercase (lower-case) strings trim() > Removes leading and trailing blanks of the string 17
The collapse command collapse (mean) varlist (sum) varlist, by(varlist) Creates an aggregate dataset by e.g. averaging or summing variables across the dimension identified in by() All observations not included in the command are dropped Useful in analysis when moving to a higher level of aggregation, e.g. aggregating trade flows from HS 6-digit to HS 2-digit Useful for calculating descriptive statistics before exporting them to excel using export excel 18
The collapse command (ct d) 19
The collapse command (ct d) If you do not want to collapse, duplicates drop after bysort (): egen gives the same results as collapse. Example (see 02_descriptive.do): collapse (mean) sales (sum) dummy_exp, by(cou sector) is equivalent to: bysort cou sector: egen avg_sales = mean(sales) bysort cou sector: egen number_exporters = total(dummy_exp) keep cou sector avg_sales number_exporters duplicates drop 19
The reshape command reshape wide (long) stub, i(var) j(var) options Reshapes dataset from long to wide format and vice versa Data dimensions such as country, year or sector are normally put in long format stub are variables in reshape wide and stubs of variables in reshape long i(var) are identifying dimensions; j(var) dimension to change Exercise: Open WDI.dta and reshape it first long and then wide (see 03_reshape_merge.do) Long i j stub 1 1 4.1 1 2 4.5 2 1 3.3 Reshape Wide i stub1 stub2 1 4.1 4.5 2 3.3 3.0 2 2 3.0 20
i. Merging datasets To merge datasets you can use joinby or merge joinby varlist using filename, unmatched(both) The command forms all pairwise combination for varlist unmatched can keep unmatched observations from the master dataset, the using dataset or both (see generated variable _merge) Exercise: Open WB_ES.dta and merge it with WDI.dta (see 03_reshape_merge.do) 21
Exercises Execute 02_descriptive.do Execute 03_reshape_merge.do (opens WB_ES.dta and merges it with WDI.dta ) 22
j. Macros Macros are names associated with some text The commands global and local assign strings to global and local macro names Globals and locals have a variety of uses To define the directories for this class, i.e. directory_definition.do They are used in loops (see next slides) A set of explanatory variables can be grouped under one macro name Global macros, once defined, are available anywhere in Stata Local macros are only available within the selected lines of a do file 23
k. Loops See Stata help and Germán Rodríguez s webpage Two main commands: foreach and forvalues foreach loops through strings of text, forvalues loops through numbers Syntax (3 alternatives): foreach item in a-list-of-things (e.g. a b d) { commands referring `item } foreach varname of varlist list-of-variables { commands referring to `varname } forvalues number = first(step)last { commands referring to `number' } 24
Examples for loops in WB_ES.dta Ex1 foreach k in USA JPN { /* Loop over any_list */ egen sales_`k'=total(sales) if cou=="`k'" } Ex2 vallist cou, local(c) /* vallist shows values and creates local */ foreach k of local c { /* Loop over a local macro */ capture drop sales_`k' egen sales_`k'=total(sales) if cou=="`k'" } Ex3 forvalues k=1(1)3 { /* Loop over sector codes. The range can be defined in different ways*/ egen total_`k'=total(sales) if sector==`k' } foreach can also be used to loop over variables and numbers foreach k of var varlist; foreach k of num numlist 25
l. Graphics Useful links: official Stata or UCLA IDRES Histograms and bar graphs: histogram var graph bar (stat) var, over(var) 26
More graphics kdensity lnsales.1.2.3 0 Distribution plots: histogram var, frequency kdensity twoway kdensity var twoway (kdensity var1 if var2== ) (kdensity var1 if var2 == ), by(var) JPN USA Graphs by cou 0 5 10 15 20 0 5 10 15 20 x Non-exporters Exporters 27
More graphics Scatter plots: scatter yvar xvar graph twoway (scatter yvar xvar) (lfit yvar xvar) 28
More graphics Line plots: line yvar year graph twoway (line yvar year if ) (line yvar year if ) 29
Exercise Execute 04_loops.do 30