Dr. Barbara Morgan Quantitative Methods

Similar documents
Econ Stata Tutorial I: Reading, Organizing and Describing Data. Sanjaya DeSilva

Basic Stata Tutorial

Lab 1: Basics of Stata Short Course on Poverty & Development for Nordic Ph.D. Students University of Copenhagen June 13-23, 2000

A Short Introduction to STATA

A Short Guide to Stata 10 for Windows

STATA 13 INTRODUCTION

An Introduction to Stata Part II: Data Analysis

Results Based Financing for Health Impact Evaluation Workshop Tunis, Tunisia October Stata 2. Willa Friedman

A Quick Guide to Stata 8 for Windows

Introduction to Stata. Getting Started. This is the simple command syntax in Stata and more conditions can be added as shown in the examples.

STATA Tutorial. Introduction to Econometrics. by James H. Stock and Mark W. Watson. to Accompany

Introduction to Stata: An In-class Tutorial

After opening Stata for the first time: set scheme s1mono, permanently

A QUICK INTRODUCTION TO STATA

API-202 Empirical Methods II Spring 2004 A SHORT INTRODUCTION TO STATA 8.0

Introduction to STATA

Intro to Stata for Political Scientists

ECO375 Tutorial 1 Introduction to Stata

CLAREMONT MCKENNA COLLEGE. Fletcher Jones Student Peer to Peer Technology Training Program. Basic Statistics using Stata

STATA TUTORIAL B. Rabin with modifications by T. Marsh

Appendix II: STATA Preliminary

Important Things to Know about Stata

Introduction to STATA

An Introductory Guide to Stata

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA

Appendix II: STATA Preliminary

Introduction to Minitab 1

Intro to Stata. University of Virginia Library data.library.virginia.edu. September 16, 2014

A quick introduction to STATA:

Introduction to Stata - Session 2

International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata

Sacha Kapoor - Masters Metrics

/23/2004 TA : Jiyoon Kim. Recitation Note 1

A quick introduction to STATA

STATA Tutorial. Elena Capatina Office hours: Mondays 10am-12, SS5017

INTRODUCTION to. Program in Statistics and Methodology (PRISM) Daniel Blake & Benjamin Jones January 15, 2010

An Introduction to Stata Part I: Data Management

Revision of Stata basics in STATA 11:

Principles of Biostatistics and Data Analysis PHP 2510 Lab2

Subject index. ASCII data, reading comma-separated fixed column multiple lines per observation

Introductory Guide to SAS:

A quick introduction to STATA:

ECONOMICS 351* -- Stata 10 Tutorial 1. Stata 10 Tutorial 1

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

SPSS. (Statistical Packages for the Social Sciences)

You will learn: The structure of the Stata interface How to open files in Stata How to modify variable and value labels How to manipulate variables

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Introduction to Stata. Written by Yi-Chi Chen

Getting started with Stata 2017: Cheat-sheet

Lab 2: OLS regression

Introduction to STATA 6.0 ECONOMICS 626

SAS Training Spring 2006

Getting Our Feet Wet with Stata SESSION TWO Fall, 2018

7/18/16. Review. Review of Homework. Lecture 3: Programming Statistics in R. Questions from last lecture? Problems with Stata? Problems with Excel?

Applied Regression Modeling: A Business Approach

Introduction to Stata Toy Program #1 Basic Descriptives

Week 1: Introduction to Stata

Department of Economics Spring 2016 University of California Economics 154 Professor Martha Olney Stata Lesson Wednesday February 17, 2016

Empirical Asset Pricing

StatCalc User Manual. Version 9 for Mac and Windows. Copyright 2018, AcaStat Software. All rights Reserved.

Math 227 EXCEL / MEGASTAT Guide

R for IR. Created by Narren Brown, Grinnell College, and Diane Saphire, Trinity University

Department of Economics Spring 2018 University of California Economics 154 Professor Martha Olney Stata Lesson Thursday February 15, 2018

An Introduction To Stata and Matlab. Liugang Sheng ECN 240A UC Davis

8. MINITAB COMMANDS WEEK-BY-WEEK

STM103 Spring 2008 INTRODUCTION TO STATA 8.0

STAT:5400 Computing in Statistics

ICSSR Data Service. Stata: User Guide. Indian Council of Social Science Research. Indian Social Science Data Repository

Excel 2010 with XLSTAT

Advanced Regression Analysis Autumn Stata 6.0 For Dummies

Intermediate Stata Workshop. Hsueh-Sheng Wu CFDR Workshop Series Spring 2009

GETTING STARTED WITH STATA. Sébastien Fontenay ECON - IRES

A Step by Step Guide to Learning SAS

Module 1: Introduction RStudio

Quantitative - One Population

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

Stata v 12 Illustration. First Session

Stata: A Brief Introduction Biostatistics

Box-Cox Transformation for Simple Linear Regression

Stata Training. AGRODEP Technical Note 08. April Manuel Barron and Pia Basurto

Basics of Stata, Statistics 220 Last modified December 10, 1999.

Intermediate Stata. Jeremy Craig Green. 1 March /29/2011 1

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT

SAS (Statistical Analysis Software/System)

A Short Guide to Stata 14

Introduction to R. Introduction to Econometrics W

STATA Hand Out 1. STATA's latest version is version 12. Most commands in this hand-out work on all versions of STATA.

ECONOMICS 452 TIME SERIES WITH STATA

Tutorial: Using Tina Vision s Quantitative Pattern Recognition Tool.

Homework 1 Excel Basics

Let s get started with the module Getting Data from Existing Sources.

Introduction to Stata - Session 1

Creating a data file and entering data

An Introduction to Stata Exercise 1

GETTING DATA INTO THE PROGRAM

Introduction to SAS. Cristina Murray-Krezan Research Assistant Professor of Internal Medicine Biostatistician, CTSC

A First Tutorial in Stata

Introduction to SAS. I. Understanding the basics In this section, we introduce a few basic but very helpful commands.

Reading data in SAS and Descriptive Statistics

I Launching and Exiting Stata. Stata will ask you if you would like to check for updates. Update now or later, your choice.

Transcription:

Dr. Barbara Morgan Quantitative Methods 195.650 Basic Stata This is a brief guide to using the most basic operations in Stata. Stata also has an on-line tutorial. At the initial prompt type tutorial. In addition to the reference manuals and on-line help I would also recommend the following: Hamilton, Lawrence C. Statistics with STATA (Brooks/Cole). 1. Reading in Data Stata starts in a default working directory. If you want to change this for example if you want to work off an e drive, start by typing: cd e:\ a. Stata data sets If you are fortunate enough to have your data in a Stata data set (sample.dta) on e drive, type: use sample b. Excel If you are creating a dataset in Excel, make sure you only have one row of variable names. The variable names should be 32 characters or less, start with a letter and contain no spaces or special characters. Missing data should be coded either as blank or as a number (e.g. -999) and not as a period. The format of the data should be such that there are no commas. Save the data as type CSV (comma separated values). Read the file into Stata using the insheet command: insheet using sample.csv Now the data is in memory. To convert to a Stata dataset type: save sample Stata will automatically append.dta to indicate that it is a Stata dataset. c. Other formats Data may be in an ASCII file with data separated by blanks or tabs and no variable names. Missing values may be indicated by a period or by a number. Use the infile command and give names to the variables as you read in the dataset: infile spend income wealth using sample.raw For nonnumeric variables, use the command str# where # is a number equal to or greater than the number of characters in the variable: infile income wealth str8 country using sample.raw You can use Stata s data editor to create a dataset directly. There are also software programs that transform data from one format to another. Stat-transfer is one such (I have a copy you can use).

2. Creating Summary Statistics For basic descriptive statistics of the whole dataset: sum For descriptive statistics of only some variables: sum income wealth For descriptive statistics on a specific group e.g. mean income for women aged 25-34: sum income if female==1 & age >=25 & age <=34 You can get more detailed information by typing: sum income, d In addition to mean, standard deviation, min, max this will give you percentiles (1,5,10,25,50,75,90,95 and 99), variance, skewness and kurtosis. For categorical variables you can get frequency distributions (for one variable) or cross tabulations using the tabulate or tab command: tab gender tab gender race Combine with the sum command to produce means of one variable for each value of another: tab gender, sum (income) tab gender race, sum (income) To look at observations, use the command: list To look at one or two variables only: list income wealth To look at individuals with an income less than $5000: list if income>5000 To look at a limited number of observations, for example the variables income and wealth for the 21 st through 50 th observations: list income wealth in 21/50 To look at all the variables in your dataset, including nonnumeric: describe

3. Running Regressions a) OLS Simple OLS regression where the dependent variable is spend and the independent variables are income and wealth: reg spend income wealth The regression command will produce coefficient estimates, standard errors, t-stats, p-values, 95% confidence intervals, F-test of all coefficients, R-squared, Adjusted R-squared, RSS, ESS, TSS To run a regression on a subset of data (e.g. just women, assuming there is a dummy variable female which is 1 for women): reg spend income wealth if female==1 To calculate predicted values and residuals after a regression: predict yhat predict e, resid Where yhat and e are the names of the predicted values and residuals respectively. b) Dichotomous dependent variables For probit and logit regressions where the dependent variable is 0 or 1 (e.g. the variable house might represent whether or not an individual buys a house) the commands are probit and logit: probit house income wealth Thie command will produce log likelihood, coefficient estimates, standard errors, z statistics, p- values, 95% confidence intervals, chi-squared test for all coefficients, pseudo R-squared Lots of initial stuff will appear on the screen this is because Stata is going through an iterative process before it comes up with the final estimates. 4. Hypothesis Testing To test whether there is a significant difference in mean income between two groups: ttest income, by (female) To test equality of two coefficients: test income==wealth To test a specific coefficient value: test wealth==0.5 You might want F-tests of a certain subset of variables. For example, you may run a regression of spend on income and wealth and some dummy variables female white. Test whether the dummy variables are jointly significant: reg spend income wealth female white test female white This test works on the most recently run regression. The output is F-stat along with degrees of freedom and p-value.

4. Data Manipulation a) Keeping and dropping variables To keep only the income variable: keep income To drop the income and wealth variables: drop income wealth You may want to get rid of some part of your sample. To keep individuals with only positive income: keep if income>0 To keep individuals with income $2000-$10000: keep if income>=2000 & income <=10000 To drop observations with income $1000 or more: drop if income >=1000 To restrict the sample to women: keep if female==1; (Note the double equals sign tests for equality, whereas a single equals sign sets something equal to something else). b) Transforming data After you have loaded your data it may not be in exactly the form you want. Often, you want to create a new variable that is a function of an existing one. The syntax is: gen new = f(old) For example: gen lninc = log (income) gen incsq = income *income gen incfem = income*female To construct dummy variables: gen white = 1 if race == Caucasian replace white = 0 if race == AfricanAmerican (Since the original variable race is nonnumeric apostrophes are placed around it). Another command that is frequently used for creating variables is egen. This is often used to create a variable that summarizes other variables: egen max_income = max (income) If there are other transformations you want, the on-line help is good. Click on help and type gen or egen or use the commands: search gen or search egen

5. Using Two Data Sets a) Merging datasets You might want to combine different data sets together. To add a data set that has the same observations, but different variables, use the merge command. To add more observations of the same variables, use the append command. In either case, make sure that the data sets you are going to use are already Stata data sets. For both of these operations, it is assumed that one data set is already open and in memory and the other is stored on a drive. Suppose one data set (data1) has information on household wealth and is in memory and another (data2) has household demographics and is on the e drive. If the datasets are organized the same way observation by observation (i.e. observation 1 is household 1 and so on in both data sets, then simply type: merge using data2 Now data2.dta has been merged with the data in memory. A new variable gets created called _merge. It tells you the status of the merge: 1 means that the resulting observation occurred only in the data set that was in memory; 2 means that the observation only occurred in the data set that was on the drive; and 3 means that the observation has information from both. If households 1-20 were in my first data set (in memory) and 11-30 were in my second (on the drive), the merged data set would go from 1-30 with missing (though different) values for some variables. The variable _merge would be 1 for observations 1-10, 2 for observations 21-30 and 3 for observations 11-20. Sometimes the two datasets are not organized as above, but have a variable identifying observations (like a household ID number). You can then you can merge on that variable. merge hhid using data2 Now Stata has created a bigger data set by matching up observations by the variable hhid. The variable _merge is still created. If you need to merge multiple data sets, after each merge, you need to get rid of the variable _merge or the next merge won t work. To do this type: drop _merge It is often advisable to sort data prior to a merge command, or for other reasons. You can sort by one or more specified variables: sort year income This sorts the data by year first and then by income within years. b) Appending data sets If you have two data sets with the same variables, but different observations, use the append command. This assumes there is one data set in memory and one on e drive. append using data2 If there are variables in one data set that are not in the other, missing values will be generated.

6. Graphing There are many things you can do to make fabulous-looking graphs. Here are the basics. a) Scatter plots To see the raw data relationship between income and wealth, type: graph twoway scatter income wealth b) Histograms Histograms are good for giving an overview of the distribution of a variable. You can specify the number of bins you want: hist income, bin(15) c) Fitting regression lines The following command will plot a partial regression plot for you if you have a multivariate regression. First, you need to estimate the relationship. To do this we use the fit command instead of reg. Then plot the graph using the command avplot: fit spend wealth avplot wealth

7. Writing.do files You may want to get familiar with STATA by working interactively initially. However, ultimately the easiest way to use STATA is to create a.do file. What the???@#%??? is a do file? It s a program that performs commands in sequence and non-interactively. Create the.do file using the do-file editor in STATA and then run it by typing.do sample (where sample.do is the name of the file). Below is an example of a short.do file that assumes there is data in a STATA file called sample.dta. Comments are enclosed in /* */ so that STATA will not read them. This is a useful way to make notes to yourself about what you are doing in a program, particularly if the program gets long. #delimit; log using sample.log, replace; clear; use sample; sum; reg spend income wealth test income wealth; test income==wealth; test wealth==0.5; gen lninc=log(income); gen femed=female*ed drop if famsize >=5; log close; /*sets the delimit, or end of sentence, to a semi-colon*/ /*opens up a log file to save your results, with replacement each time program is run*/ /*makes sure nothing is in memory*/ /*loads the Stata dataset sample.dta*/ /*calculates descriptive statistics*/ /*runs a regression of spending on income and wealth*/ /*performs an F-test on income and wealth*/ /*tests equality of two coefficients /*tests a specific coefficient value /*creates a new variable*/ /creates an interaction variable /*drops observations where family size is 5 or more*/ /* closes the log file*/ All of your results will be saved in the log file. You can also directly print results in the Results screen using File Print Results.