Introduction to Stata: An In-class Tutorial

Similar documents
/23/2004 TA : Jiyoon Kim. Recitation Note 1

A Short Introduction to STATA

Dr. Barbara Morgan Quantitative Methods

Getting started with Stata 2017: Cheat-sheet

Econ Stata Tutorial I: Reading, Organizing and Describing Data. Sanjaya DeSilva

Bivariate (Simple) Regression Analysis

Introduction to Stata Toy Program #1 Basic Descriptives

Introduction to STATA

Stata v 12 Illustration. First Session

ECONOMICS 452* -- Stata 12 Tutorial 1. Stata 12 Tutorial 1. TOPIC: Getting Started with Stata: An Introduction or Review

ECONOMICS 351* -- Stata 10 Tutorial 1. Stata 10 Tutorial 1

PubHlth 640 Intermediate Biostatistics Unit 2 - Regression and Correlation. Simple Linear Regression Software: Stata v 10.1

I Launching and Exiting Stata. Stata will ask you if you would like to check for updates. Update now or later, your choice.

Data Management 2. 1 Introduction. 2 Do-files. 2.1 Ado-files and Do-files

An Introduction to Stata Exercise 1

Chapter 2 The SAS Environment

Stata: A Brief Introduction Biostatistics

Lab 2: OLS regression

An Introduction to STATA ECON 330 Econometrics Prof. Lemke

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA

Introduction to Stata First Session. I- Launching and Exiting Stata Launching Stata Exiting Stata..

An Introductory Guide to Stata

You will learn: The structure of the Stata interface How to open files in Stata How to modify variable and value labels How to manipulate variables

Introduction to Stata - Session 1

Advanced Regression Analysis Autumn Stata 6.0 For Dummies

Depending on the computer you find yourself in front of, here s what you ll need to do to open SPSS.

To complete the computer assignments, you ll use the EViews software installed on the lab PCs in WMC 2502 and WMC 2506.

Introduction to Stata - Session 2

Getting Started Using Stata

Intro to Stata for Political Scientists

Week 4: Simple Linear Regression II

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Week 4: Simple Linear Regression III

An Introduction to Stata Part I: Data Management

Stata version 13. First Session. January I- Launching and Exiting Stata Launching Stata Exiting Stata..

A quick introduction to STATA:

1 Introduction to Using Excel Spreadsheets

Applied Regression Modeling: A Business Approach

2 The Stata user interface

SPSS 11.5 for Windows Assignment 2

Introduction to STATA

An Introduction to Stata

FrontPage 98 Quick Guide. Copyright 2000 Peter Pappas. edteck press All rights reserved.

Empirical Asset Pricing

Introduction to STATA 6.0 ECONOMICS 626

ST Lab 1 - The basics of SAS

Easy Windows Working with Disks, Folders, - and Files

Review of Stata II AERC Training Workshop Nairobi, May 2002

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL

Getting Started Using Stata

A quick introduction to STATA

Intermediate Stata. Jeremy Craig Green. 1 March /29/2011 1

Revision of Stata basics in STATA 11:

GRETL FOR TODDLERS!! CONTENTS. 1. Access to the econometric software A new data set: An existent data set: 3

Week 1: Introduction to Stata

Week 5: Multiple Linear Regression II

A quick introduction to STATA:

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

Introduction to SAS. I. Understanding the basics In this section, we introduce a few basic but very helpful commands.

STATA Tutorial. Introduction to Econometrics. by James H. Stock and Mark W. Watson. to Accompany

Chapter One: Getting Started With IBM SPSS for Windows

Creating a data file and entering data

Introduction to Stata. Getting Started. This is the simple command syntax in Stata and more conditions can be added as shown in the examples.

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%

STATA TUTORIAL B. Rabin with modifications by T. Marsh

Excel 2013 Beyond TheBasics

TUTORIAL FOR IMPORTING OTTAWA FIRE HYDRANT PARKING VIOLATION DATA INTO MYSQL

After opening Stata for the first time: set scheme s1mono, permanently

Using Microsoft Excel

Principles of Biostatistics and Data Analysis PHP 2510 Lab2

Stata Session 2. Tarjei Havnes. University of Oslo. Statistics Norway. ECON 4136, UiO, 2012

This is a separate window in the software so if you wish to return to the main screen, just close this window by clicking X in the upper right corner.

Customizing DAZ Studio

Introduction to Stata

Course contents. Overview: Goodbye, calculator. Lesson 1: Get started. Lesson 2: Use cell references. Lesson 3: Simplify formulas by using functions

tabulate varname [aw=weightvar]

25 Working with categorical data and factor variables

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT

The QuickCalc BASIC User Interface

Exercise 1: Introduction to Stata

Panel Data 4: Fixed Effects vs Random Effects Models

Computers for Beginners

Lab 1: Introduction, Plotting, Data manipulation

For many people, learning any new computer software can be an anxietyproducing

If you use Stata for Windows, starting Stata is straightforward. You just have to double-click on the wstata (or stata) icon.

Chapter 11 Dealing With Data SPSS Tutorial

Using Microsoft Excel

API-202 Empirical Methods II Spring 2004 A SHORT INTRODUCTION TO STATA 8.0

Computer Basics. Hardware. This class is designed to cover the following basics:

How to Archive s in Outlook 2007

Intermediate Excel 2003

Lecture 3: The basic of programming- do file and macro

STATA 13 INTRODUCTION

SAS Training Spring 2006

Stata versions 12 & 13 Week 4 Practice Problems

Stata For Dummies. Table of Contents

ECO375 Tutorial 1 Introduction to Stata

Excel Basics Rice Digital Media Commons Guide Written for Microsoft Excel 2010 Windows Edition by Eric Miller

What is Stata? A programming language to do sta;s;cs Strongly influenced by economists Open source, sort of. An acceptable way to manage data

QUICKBOOKS TO ACCOUNTEDGE CONVERSION GUIDE

Transcription:

Introduction to Stata: An I. The Basics - Stata is a command-driven statistical software program. In other words, you type in a command, and Stata executes it. You can use the drop-down menus to avoid typing in commands, but that won t be a long-term solution because you will want to learn most commands in order to create do files (described below). If you use a command from a drop-down menu, Stata will print out the words you could have typed to execute it without the menu. This is one way to learn the commands (but not the recommended way, as sometimes those drop-down commands There are other statistical packages similar to Stata. Once you learn one, it is very easy to learn another. - You ve heard about Stata s advantages. Here are some of its drawbacks The most obvious is the need to learn a new language Stata s GUI interface is good, but not as good as most Windows apps Stata s graphics get better with each version, but Excel graphs are easier to format and create The Stata help can be frustrating, because you usually need to know the command name to get help using it. I have posted a file to the course page with a list of frequently used Stata commands. II. The First Time - Configure Stata Windows The first time you start Stata, you will see a number of windows that fill up only part of your desktop. You will want to resize and relocate the various windows. If you make changes to the windows, Stata will default to these changes the next time you start up. You can also change fonts by right clicking in each window. If you want to save a particular window configuration for future use, you can click Edit, Preferences, Save Preference Set, New Preference Set. You can then load this windowing configuration by going through the same menu. To allow you to scroll back and look at a longer history of results, click Edit, Preferences, General Preferences, Windowing and enter 500 (or 50000) in the first box (see screen shot below) 1

- Icons We ll go through those I have labeled by the end of the tutorial. - (Optional) set matsize yyy, where yyy is the number of variables you are going to use in a regression. (e.g. set at 500). This is optional. You should not need this unless you are working with hundreds of variables. More expensive versions of Stata permit larger limits. 2

III. Opening, Manipulating, and Saving Data Files - Stata permits you to invoke limited DOS commands. Though they are archaic, it is useful to learn a few. cd d:\decs431 (to change the working directory) dir (to list out the contents of a folder) pwd (to see the current path and folder) You can also go to File, Change Working Directory to locate the directory you want. I find it useful to change my default folder to the one that holds my data set. You can change Stata s default working directory by right-clicking on the Stata icon in your program directory, then selecting Properties, Shortcut. Enter the directory in which you want Stata to start, e.g. d:\decs431 - Opening Stata Data Files Stata can only work with Stata data files, which have the suffix.dta Just as with other software, there are several options for opening a Stata data file. One nice option is to go to the File menu and select Recent Datasets. Another option is to type a Stata command, telling Stata to use a particular file. The syntax for the command is use path\filename For example, on my laptop I have a file autosales.dta that resides in my d:\decs431 folder. To open that file, I have two command choices: use d:\decs431\autosales or cd d:\decs431 followed by use autosales Note: When referring to path names with spaces (e.g., d:\my path\file name), you must use quotation marks (e.g. d:\my path\file name ) The file autosales.dta is now our active dataset. Note how these (and all other) commands appear in the Review window. You can press the page up or page down keys to access past commands. Alternatively, you can click on past commands in the Review window. - Examine your data You can re-enter a command by clicking on it. If you don t want to type variable names, you can click on them. describe will describe the data: you get the name and size of the file, number of observations, variable names and labels, and the storage type for each variable (string or various numerical types) You can provide labels (brief descriptions) for your variables using label variable command. These labels appear when you use the describe command. Try label var month "Month of Sale". It is a 3

good idea to label every variable in your data unless the variable name is fully self-explanatory. 1 Another way to look at variables is using codebook. This gives much more detail but can take significant computer time on large datasets. A useful alternative is codebook, compact You can eyeball the data in their rows and columns and make changes to specific observations - by clicking on the edit button. The browse icon lets you look, but doesn t let you edit the data. You can also examine your data by typing list var1 var2 (fill in the variables you want to explore). If you only want to see observations in rows m through n, type list var1 var2 in m/n. (E.g., to see the first 10 rows, type list var1 var2 in 1/10.) You can get summary statistics, tabulations, and correlations summarize odometer (gives the mean, std. dev., range, and sample size) summarize odometer, detail (same as above, plus median and other percentiles of the distribution as well as outliers) histogram odometer (produces histogram) tab year (to get all values and their percentiles) corr year odometer (for a simple correlation matrix) pwcorr year odometer, sig (to obtain significance levels for the correlations) - Saving your data You can use the GUI interface: simply click the Save icon You can type save autosales2. Always keep an unchanged version of the original data. IV. Brief Summary of Stata Syntax - Stata has a few predetermined ways of entering commands. Once you learn them it will be very easy to learn new commands. The basic syntax are: Basic Syntax 1: command variable1 variable2 variablen summarize odometer year Basic Syntax 2: command variable1 variablen if logical expression summarize odometer year if odometer>30000 The if operators are: if a == b (two equals signs): if a equals b if a > b: if a is greater than b if a < b: if a is less than b 1 Variable names may be 1 to 32 characters long, and must start with a-z, A-Z, or _, and the remaining characters may be a-z, A-Z, _, or 0-9. When referring to an existing varname, we can abbreviate -- use only some of the leading characters -- as long as we specify enough to uniquely identify the variable: Myv might be a unique abbreviation for Myvar. (Excerpted from the Stata Reference Manual). 4

if a >= b: if a is greater than or equal to b if a <= b: if a is less than or equal to b if a ~= b: if a does not equal b You can combine two or more of these statements with the logical AND ( &) or the logical OR ( ). Be sure to use the correct parentheses as needed! The if operator works row by row, just like in excel. It executes the command only for the rows in which the conditional is true. Basic Syntax 3: command variable1 variablen in value1 / value2 summarize odometer year in 5/10 list odometer year in 5/10 similar to the if command. Executes the command for all rows between value1 and value2. Signals error if value1 is greater than value2. Basic Syntax 4: command variable1 variablen, options summarize odometer year, detail Different commands have different options. You will get to know the most useful options as we go along. The comma (, ) separates the command line from the options. The options are always the last thing in a command line. For example, the following is valid: summarize odometer year if price>10000, detail This is NOT valid: summarize odometer year, detail if price>10000 Alternative Syntax: (we ll go over them again, no need to memorize them) DOS commands: dir cd directory erase filename Stata control commands: do filename edit clear use filename set parameter value Remember, if you forget how to write something, you can always find it in the drop down menus or by typing help command V. Keeping Track of What You Have Done - One way is to click on the print icon. This will waste a bunch of paper. - I strongly urge you to create a Stata log a separate file containing all your work. Here s how to create a log file. Click on the log button. This opens a familiar save file type window. Save the file as type Log (DO NOT USE the.smcl default) Alternatively, you can type log using mylogfile.log 5

- Stata will save a record of your work in a text file. You can edit and print this record using MS Word or Wordpad. - You can temporarily stop the log file by typing log off and restart it by typing log on. In this way, you can avoid clogging up the log with unnecessary stuff. Type log close to end your log file. - You can place comments in your log by placing an asterisk (*) before the Stata command line Stata will not execute lines beginning with * - To print the log file from within Stata, click on the log button again and select View snapshot of log file. Your log file will appear in a separate window. VI. The Do File editor The Do file editor is a document editor where you write the commands as if you were writing them directly in the command line. It helps keep track of your work. You can re-run any segment by highlighting it with the mouse and clicking one of the two execute buttons. If you click the execute buttons without highlighting anything the whole document will be executed. 6

Execution stops if an error is generated. For example, if you try to create a variable that already exists. Each command must be on a separate line. Use * at the beginning of a line to insert comments. That line will not be executed. Use // as the end of a line to insert comments. The rest of the line will not be executed. VII. Creating Variables - Suppose you want a variable that equals the percentage of the population in the PMSA that is out of the workforce, e.g. the young and the old. Just pick a variable name (say, ageatsale ) and use the generate, or gen command: gen ageatsale = year-modelyear Note missing values are recorded as. Stata usually ignores these values. More on this later. - Use replace to change values of an existing variable (e.g. replace ageatsale=ageatsale*12) - Use the rename command to change the name of a variable. Alternatively, you can right click the variable in the Variables box. - Use the if option to execute any command on a subset of observations. For example: replace ageatsale=0 if ageatsale<0 - Dummy variables: These are variables that equal one for some observations and zero for others. These can be used to designate mutually exclusive and exhaustive groups. For example: build a variable for saturated hospitals and unsaturated hospitals: For example: generate mileagehigh = (odometer>40000). Let s see if it worked: sum mileagehigh WARNING: when it comes to evaluating if statements, Stata treats missing values as equaling positive infinity. Safe way to execute above command: generate mileagehigh = (odometer>40000) if!missing(odometer) Use drop command to get rid of variables: drop mileagehigh 7

VIII. Estimating a Regression - Suppose you want to estimate the following model price = B 0 + B 1 ageatsale + B 2 odometer - Just type reg price ageatsale odometer - Or, type regress and then click on the variables in the Variables window. We will learn what all of the output means as the course progresses. reg price ageatsale odometer Source SS df MS Number of obs = 50000 -------------+------------------------------ F( 2, 49997) =. Model 2.7932e+11 2 1.3966e+11 Prob > F = 0.0000 Residual 6.6499e+10 49997 1330068.59 R-squared = 0.8077 -------------+------------------------------ Adj R-squared = 0.8077 Total 3.4582e+11 49999 6916518.91 Root MSE = 1153.3 ------------------------------------------------------------------------------ price Coef. Std. Err. t P> t [95% Conf. Interval] -------------+---------------------------------------------------------------- ageatsale -763.9426 5.882457-129.87 0.000-775.4723-752.4129 odometer -.0620302.000334-185.74 0.000 -.0626848 -.0613757 _cons 12648.41 8.707638 1452.57 0.000 12631.35 12665.48 ------------------------------------------------------------------------------ 8