A Short Introduction to STATA

Similar documents
Dr. Barbara Morgan Quantitative Methods

Introduction to Stata: An In-class Tutorial

Introduction to Stata - Session 1

A quick introduction to STATA

Econ Stata Tutorial I: Reading, Organizing and Describing Data. Sanjaya DeSilva

Intro to Stata for Political Scientists

CLAREMONT MCKENNA COLLEGE. Fletcher Jones Student Peer to Peer Technology Training Program. Basic Statistics using Stata

An Introduction to Stata Exercise 1

Revision of Stata basics in STATA 11:

Data Management 2. 1 Introduction. 2 Do-files. 2.1 Ado-files and Do-files

Introduction to Stata. Getting Started. This is the simple command syntax in Stata and more conditions can be added as shown in the examples.

GETTING DATA INTO THE PROGRAM

An Introduction to STATA ECON 330 Econometrics Prof. Lemke

A First Tutorial in Stata

Preparing Data for Analysis in Stata

Data analysis using Stata , AMSE Master (M1), Spring semester

A Short Guide to Stata 10 for Windows

A quick introduction to STATA:

Econometric Tools 1: Non-Parametric Methods

A Quick Guide to Stata 8 for Windows

Getting started with Stata 2017: Cheat-sheet

A quick introduction to STATA:

An Introduction to Stata Part I: Data Management

Introduction to Stata. Written by Yi-Chi Chen

Introduction to Stata First Session. I- Launching and Exiting Stata Launching Stata Exiting Stata..

After opening Stata for the first time: set scheme s1mono, permanently

API-202 Empirical Methods II Spring 2004 A SHORT INTRODUCTION TO STATA 8.0

Introduction to STATA 6.0 ECONOMICS 626

INTRODUCTION to. Program in Statistics and Methodology (PRISM) Daniel Blake & Benjamin Jones January 15, 2010

An Introductory Guide to Stata

RUDIMENTS OF STATA. After entering this command the data file WAGE1.DTA is loaded into memory.

Sacha Kapoor - Masters Metrics

GETTING STARTED WITH STATA. Sébastien Fontenay ECON - IRES

GRETL FOR TODDLERS!! CONTENTS. 1. Access to the econometric software A new data set: An existent data set: 3

ECONOMICS 452* -- Stata 12 Tutorial 1. Stata 12 Tutorial 1. TOPIC: Getting Started with Stata: An Introduction or Review

Week 1: Introduction to Stata

An Introduction to Stata Part II: Data Analysis

ECO375 Tutorial 1 Introduction to Stata

ECONOMICS 351* -- Stata 10 Tutorial 1. Stata 10 Tutorial 1

PubHlth 640 Intermediate Biostatistics Unit 2 - Regression and Correlation. Simple Linear Regression Software: Stata v 10.1

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA

Department of Economics Spring 2018 University of California Economics 154 Professor Martha Olney Stata Lesson Thursday February 15, 2018

STATA 13 INTRODUCTION

STATA Tutorial. Introduction to Econometrics. by James H. Stock and Mark W. Watson. to Accompany

Introduction to STATA

To complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506.

Introduction to Stata Toy Program #1 Basic Descriptives

/23/2004 TA : Jiyoon Kim. Recitation Note 1

Creating summary tables using the sumtable command

Lab 2: OLS regression

Lab 1: Basics of Stata Short Course on Poverty & Development for Nordic Ph.D. Students University of Copenhagen June 13-23, 2000

Week 4: Simple Linear Regression II

MPhil computer package lesson: getting started with Eviews

The Stata Bible 2.0. Table of Contents. by Dawn L. Teele 1

An Introduction to Stata By Mike Anderson

Linear and Quadratic Least Squares

set mem 10m we can also decide to have the more separation line on the screen or not when the software displays results: set more on set more off

Department of Economics Spring 2016 University of California Economics 154 Professor Martha Olney Stata Lesson Wednesday February 17, 2016

Chapter 2 The SAS Environment

Basic Stata Tutorial

You will learn: The structure of the Stata interface How to open files in Stata How to modify variable and value labels How to manipulate variables

Results Based Financing for Health Impact Evaluation Workshop Tunis, Tunisia October Stata 2. Willa Friedman

Two-Stage Least Squares

Introduction to STATA

A Short Guide to Stata 14

Chapter 2 Assignment (due Thursday, April 19)

ECONOMICS 452 TIME SERIES WITH STATA

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

I Launching and Exiting Stata. Stata will ask you if you would like to check for updates. Update now or later, your choice.

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Introduction to Stata Session 3

Activity: page 1/10 Introduction to Excel. Getting Started

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Research Support. Processing Results in Stata

Intermediate Stata. Jeremy Craig Green. 1 March /29/2011 1

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Excel Primer CH141 Fall, 2017

An Introduction To Stata and Matlab. Liugang Sheng ECN 240A UC Davis

An introduction to plotting data

Stata v 12 Illustration. First Session

GiftWorks Import Guide Page 2

Migration and the Labour Market: Data and Intro to STATA

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE

ADVANCED INQUIRIES IN ALBEDO: PART 2 EXCEL DATA PROCESSING INSTRUCTIONS

This activity will show you how to use Excel to draw cumulative frequency graphs. Earnings ( x/hour) 0 < x < x

Labor Economics with STATA. Estimating the Human Capital Model Using Artificial Data

Introduction to the workbook and spreadsheet

Introduction to Stata - Session 2

Principles of Biostatistics and Data Analysis PHP 2510 Lab2

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

Basics of Stata, Statistics 220 Last modified December 10, 1999.

PIXEL SPREADSHEET USER MANUAL April 2011

STATA Tutorial. Elena Capatina Office hours: Mondays 10am-12, SS5017

Here is the data collected.

Appendix II: STATA Preliminary

Stata version 13. First Session. January I- Launching and Exiting Stata Launching Stata Exiting Stata..

Notes for Student Version of Soritec

Important Things to Know about Stata

Subject index. ASCII data, reading comma-separated fixed column multiple lines per observation

1 Introduction to Using Excel Spreadsheets

Transcription:

A Short Introduction to STATA 1) Introduction: This session serves to link everyone from theoretical equations to tangible results under the amazing promise of Stata! Stata is a statistical package that includes a wide variety of capabilities, such as data management, statistical and econometric analysis, graphics, etc. The user s interface includes the following windows (see Figure 1.) Command Window (highlighted in red): the window where we can type all the commands; Results Window (highlighted in blue): the window displays all the results and output generated by the commands we have typed; Variables Window (highlighted in orange): the window shows all the variables currently stored in the Stata s memory. We can visualize these variables as in spreadsheet by typing in the Command Window browse (br) followed by the variables to be displayed (if no variables are specified, Stata will show all the variables). If we want to make changes to the data, we will type edit in the Command Window. Command History (highlighted in green): the window keeps a record of all the commands used in each session. Current Working Directory (highlighted in black): the window shows the current directory in the file of your computer from where State will read or save any files. It can be changed by writing in the Command Window cd path_to_the_new_directory (e.g. cd c:\desktop\state11\session1 or cd c:\desktop\state11\session 1 if the directory contains a space); or from the Stata menu: File/Change Working Directory. Figure 1: State User s Interface

2) Some Basic Commands: To clear all the variables saved in Stata s memory from last session, we can type in the Command Window clear; When we need to learn the use of a command, like what options it allows, or to see some examples of its uses, we can type help name_of_the_command or findit name_of_the_command in the Command Window. Try help reg and findit reg, and see the differences. If we are not sure about the name the command we need, we can type search instead. Any command in Stata that is preceded by a star (*) will be regarded as comment, and will not be executed by Stata. Stata can also be used a calculator by using the command display (e.g. display 4+5). 3) Entering Data: I. Input from.xls or.xlsx files If your original data source in an excel files or workbook looks like this: Econ526 students may recognize this is the data set from C. Dougherty s textbook Introduction to Econometrics, with eaef21.xls as its file name. The command to input this into Stata is import excel using eaef21, firstrow case(lower) Here, excel cannot be omitted, as we do not only import excel, we also import others like txt file. firstrow means to treat the first row in the excel file as the default variable names in Stata. Notice they are all in upper case letters, so case(lower) is used as part of the command to have lower case letters as variable names. A Capital letter and the same lower case letter are different variables in Stata. So likewise, case(preserve) keeps the names unchanged from the excel file; use case(upper) if you want upper case names anyway.

II. Input from.csv files A.csv file is different from an.xls file in that data are separated by comma in.csv files. Using the same data set for example, save is as an.csv file, you are supposed to use thefollowing command to load it: import delimited using eaef21.csv Here, you don t need to specify the firstrow or case(lower) as the first row from.csv file serves as variable names and they are in lower case automatically. It makes sense since.csv file has separated data already, it eases Stata to pin down the data structure, thus you benefit by having an easier command. Another way to load a.csv file is to usean older version command insheet: insheet using eaef21.csv These two commands yield the same result. Starting from Stata14, insheet is replaced by a new command import delimited. So if you are using an old version, use insheet. It still works in up-to-date versions of Stata, its help file just may no longer update. III. Input from.txt files A.txt file may look like this: This data "earnings" is taken from R. Davidson and J.G. MacKinnon Econometric Theory and Method, New York, Oxford University Press, 2004. The first column is observation number; column 2 to 4 are dummy variables for individuals in group 1, 2 and 3 respectively. The last column is average annual earnings in 1988 and 1989, measured in 1982 US dollars. You may notice there are no names shown up in the first row, so you are supposed to key in the variable names all by yourself, and the command for dealing with.txt files is infile: infile obs d1 d2 d3 earnings using earnings.txt where obs is the variable name for observation numbers, so are d1 d2 d3 and earnings.

IV. Miscellaneous Actually it s also quite easy for us to generate number of observations in a given data set: gen n = _n gen is short for generate, n is the variable name, _n is the way Stata tracks observations. For example, Let s regress earnings on two dummies d1 and d2. reg earnings d1 d2 lf you want to run a regression without using the first 500 observations, just plus if_n>500 in the command: reg earnings d1 d2 if _n > 500 Since referring to a specific observation is quite handy, we don t really need the variable obs in our data set. The way to delete it is to use drop drop obs You can drop variables, you can also drop part of the observations, before we do that, let s preserve the data first so that we can restore it easily after this destructive trial. preserve drop if _n <=1000 restore After carrying out the second command, Stata reminds you that 1000 obs have been deleted. But once you preserve the data, you can always restore it, and restore it onceonly! Au contraire, the reverse operation of drop is keep. keep earnings is equivalent to drop n d1 d2 d3 To prevent you from forgetting about what a particular variable is about, label it: label var earnings "Average annual earnings" var stands for variable, anything put in the quotation is the label, pretty self-clear. Stata stores on hard drive its own data set as a.dta file. Whenever you want to open an existing data set, use the following command: use earnings Again, like every case above, you have to put earnings.dta under the current working directory. Stata also contain 27 data sets (in the 14th version) of its own, those data sets cannot be deleted providing

your Stata is intact, and they also serve repeatedly as example data for demonstrative purpose in Stata s User Reference Manual which I highly recommend anyone who wants to learn more. Please type sysuse dir to form an initial impression of these data sets. The command to invoke any of them is sysuse (e.g. sysuse auto). 4) Exploring the Data: We have seen commands that can help us explore and understand the data better. Type the following command to use the NLSW88 dataset (National Longitudinal Survey of Women in 1988) webuse nlsw88 or webuse nlsw88, clear if you need to clear preloaded variables Now, try the following commands and see the differences between them: describe describe wage age summarize wage sum wage summarize wage, detail sum wage, de list age race married list age race married in 1/10 codebook wage inspect wage tab race collgrad tab race collgrad, nolabel tab race collgrad if wage>16.5 Note that when we add if followed by a condition (e.g. wage>16.5 the command will be executed only for those observations in the dataset that meet this condition.

0.05 Density.1.15 5) Visualizations A. Histograms To see the distribution of a variable graphically, we use command histogram or hist: For example, type histogram wage; or hist wage, normal if you would like to add a normal distortion to it in the Command Window, you should see the following picture. 0 10 20 30 40 hourly wage The picture shows that wage is right skewed. B. Scatter Graphs graph twoway scatter wage tenure graph twoway (scatter wage tenure)(lfit wage tenure) We use lfit to create a liner predication over the variable scatter wage tenure scatter wage tenure, by(race) Note that in the context of graphs, by is used as an option (after a comma) rather than as a prefix. C. Matrix Graphs graph matrix wage tenure hours D. Box Graphs graph box wage, over(race) The following picture will be generated:

0 hourly wage 10 20 30 40 white black other From the picture, it seems that median wage among the three ethnic groups does not differ too much, even though the whites have more high income outlier. 6) An OLS regression: To run an OLS regression we can use the command regress or, in short, reg followed by the dependent variable (the one we want to explain) and the independent variable or variables (the ones that we suspect explain the dependent variable). For example: runs a regression of wage on tenure, collgrad, and married. reg wage tenure collgrad married After running a regression, Stata temporarily stores (until another regression is run) some useful items. For example we can generate the residuals of the regression by using the command predict: predict myresids, residuals Residuals of the aforementioned regression are then saved in the variable myresids. Are my residuals correlated with any other variables that perhaps is missing in my regression? Use the command correlate or a scatter graph as shown below to check this. 7) Hypothesis Testing Hypothesis testing is straight forward in Stata, for instance, if we want to test the coefficient of tenure equals zero: test tenure = 0 and it give the result: ( 1) tenure = 0 F( 1, 2227) = 58.18

Prob > F = 0.0000 This is a single variable test. The joint significant test for the coefficients on collgrad and marrid equal zero is: test collgrad = marrid = 0 and it gives the result ( 1) collgrad - married = 0 ( 2) collgrad = 0 F( 2, 2227) = 80.20 Prob > F = 0.0000 The following commands get you fitted values y and the residuals u predict yhat, xb predict u, res To get them out of the regression, the command is predict, yhat and u are names, option xb tells Stata you want the fitted values, and resid is just short for residuals. You ll find two more variables appear on your variable list. Finally, all the useful information has been stored in the e-class 3 (e stands for estimation) returns. Please take a look at them by using the following command after the regression: ereturn list 8) Extra Resources http://www.stata.com/links/resources-for-learning-stata/ http://www.stata.com/links/video-tutorials/