Introduction to STATA

Similar documents
Introduction to SAS. I. Understanding the basics In this section, we introduce a few basic but very helpful commands.

Dr. Barbara Morgan Quantitative Methods

After opening Stata for the first time: set scheme s1mono, permanently

Introduction to Stata: An In-class Tutorial

INTRODUCTION to. Program in Statistics and Methodology (PRISM) Daniel Blake & Benjamin Jones January 15, 2010

Intermediate SPSS. If you have an SPSS dataset (*.sav), you can open it in the following way:

Applied Regression Modeling: A Business Approach

Revision of Stata basics in STATA 11:

International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata

Intermediate Stata. Jeremy Craig Green. 1 March /29/2011 1

Applied Regression Modeling: A Business Approach

Quick Start Guide Jacob Stolk PhD Simone Stolk MPH November 2018

User Services Spring 2008 OBJECTIVES Introduction Getting Help Instructors

STATA 13 INTRODUCTION

Intro to Stata for Political Scientists

STM103 Spring 2008 INTRODUCTION TO STATA 8.0

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

An Introductory Guide to Stata

Introduction to STATA

addition + =5+C2 adds 5 to the value in cell C2 multiplication * =F6*0.12 multiplies the value in cell F6 by 0.12

Advanced Regression Analysis Autumn Stata 6.0 For Dummies

A quick introduction to STATA:

Basic Stata Tutorial

An Introduction to Stata Part II: Data Analysis

Dealing with Data in Excel 2013/2016

Using Large Data Sets Workbook Version A (MEI)

STA 570 Spring Lecture 5 Tuesday, Feb 1

Research Methods for Business and Management. Session 8a- Analyzing Quantitative Data- using SPSS 16 Andre Samuel

Excel Assignment 4: Correlation and Linear Regression (Office 2016 Version)

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

BIOSTATISTICS LABORATORY PART 1: INTRODUCTION TO DATA ANALYIS WITH STATA: EXPLORING AND SUMMARIZING DATA

A Short Guide to Stata 10 for Windows

Stata v 12 Illustration. First Session

You will learn: The structure of the Stata interface How to open files in Stata How to modify variable and value labels How to manipulate variables

2016 SPSS Workshop UBC Research Commons

A Short Introduction to STATA

Scatterplots. This handout focuses mainly on the Stata menu approach to obtaining scatterplots, but display equivalent command-line language.

1. Data Analysis Yields Numbers & Visualizations. 2. Why Visualize Data? 3. What do Visualizations do? 4. Research on Visualizations

WORKSHOP: Using the Health Survey for England, 2014

Exploratory data analysis with one and two variables

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT

SAS Training Spring 2006

An Introduction to Stata Part I: Data Management

A Quick Guide to Stata 8 for Windows

8: Statistics. Populations and Samples. Histograms and Frequency Polygons. Page 1 of 10

Scatterplot: The Bridge from Correlation to Regression

1 Introducing Stata sample session

Week 1: Introduction to Stata

Introduction to Statistics lab 1

ICSSR Data Service. Stata: User Guide. Indian Council of Social Science Research. Indian Social Science Data Repository

Intermediate Stata Workshop. Hsueh-Sheng Wu CFDR Workshop Series Spring 2009

Stata: A Brief Introduction Biostatistics

How Do I Choose Which Type of Graph to Use?

Introduction to Statistics lab 1

Box-Cox Transformation for Simple Linear Regression

Introduction to Stata

Lastly, in case you don t already know this, and don t have Excel on your computers, you can get it for free through IT s website under software.

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE

Minitab Study Card J ENNIFER L EWIS P RIESTLEY, PH.D.

Introduction to Stata Toy Program #1 Basic Descriptives

A quick introduction to STATA:

SPSS TRAINING SPSS VIEWS

Chapter 11 Dealing With Data SPSS Tutorial

Preparing for Data Analysis

Bivariate (Simple) Regression Analysis

Your Name: Section: INTRODUCTION TO STATISTICAL REASONING Computer Lab #4 Scatterplots and Regression

IDS 101 Introduction to Spreadsheets

Intro to STATA Lecture Notes Stuart Soroka Department of Political Science, McGill University January 2010 *

Results Based Financing for Health Impact Evaluation Workshop Tunis, Tunisia October Stata 2. Willa Friedman

Graphing on Excel. Open Excel (2013). The first screen you will see looks like this (it varies slightly, depending on the version):

Getting started with Stata 2017: Cheat-sheet

1 Introduction to Using Excel Spreadsheets

Further Maths Notes. Common Mistakes. Read the bold words in the exam! Always check data entry. Write equations in terms of variables

Survey of Math: Excel Spreadsheet Guide (for Excel 2016) Page 1 of 9

Multiple Regression White paper

Four steps in an effective workflow...

Independent Variables

Introduction to Stata - Session 2

Subject. Creating a diagram. Dataset. Importing the data file. Descriptive statistics with TANAGRA.

Math 227 EXCEL / MEGASTAT Guide

Introduction to WHO s DHIS2 Data Quality Tool

A First Tutorial in Stata

Workshop for empirical trade analysis. December 2015 Bangkok, Thailand

Introduction. About this Document. What is SPSS. ohow to get SPSS. oopening Data

SBCUSD IT Training Program. MS Excel lll. VLOOKUPS, PivotTables, Macros, and More

Studying in the Sciences

Easing into Data Exploration, Reporting, and Analytics Using SAS Enterprise Guide

Department of Economics Spring 2016 University of California Economics 154 Professor Martha Olney Stata Lesson Wednesday February 17, 2016

Preparing for Data Analysis

Frequency Tables. Chapter 500. Introduction. Frequency Tables. Types of Categorical Variables. Data Structure. Missing Values

Introduction to Excel Workshop

MATH 1070 Introductory Statistics Lecture notes Descriptive Statistics and Graphical Representation

STATA Version 9 10/05/2012 1

ECO375 Tutorial 1 Introduction to Stata

Chapter 6: DESCRIPTIVE STATISTICS

StatLab Workshops 2008

Math 121 Project 4: Graphs

Stata version 13. First Session. January I- Launching and Exiting Stata Launching Stata Exiting Stata..

M7D1.a: Formulate questions and collect data from a census of at least 30 objects and from samples of varying sizes.

Transcription:

Center for Teaching, Research and Learning Research Support Group American University, Washington, D.C. Hurst Hall 203 rsg@american.edu (202) 885-3862 Introduction to STATA WORKSHOP OBJECTIVE: This workshop is designed to give a basic understanding of how STATA works and how to use simple statistical methods to analyze data. LEARNING GOALS: 1. Understand the basics of the STATA interface 2. Managing data 3. Computing descriptive statistics 4. Creating basic graphs 5. Running simple linear regressions SCENARIO: We are interested in studying the relationship between the amount of corruption in a country and the quality of their economy. DATA: For this question, we are lucky as data already exists. We will use the World Development Indicators which are compiled annually by The World Bank. We are using data from a past year, and we have simplified the data set, but you can find the full data set here: http://data.worldbank.org/data-catalog/world-development-indicators. We are also lucky, because the World Bank saves their data in a way that makes it easy to open. Often when you find data set (for example, old Census data), it can take a lot of effort to transform the data into a usable form. And, of course, there is never a guarantee that someone else has created the data set you want in advance; often the first step in a research project is spending a long time looking up values to create the data set you need. Within the WDI data set, we will focus on the variables corrup and gdppc. The first is an index of perceived corruption within a country s government. Citizens are surveyed, and asked questions like Would you need to plan bribes to start a local business? Note that corrupt is reverse coded so that high scores equal low corruption. The second variable is Gross Domestic Product per Capita, which indicates how well the economy is doing, scaled for the population size of a country (for example, it is not very interesting to note that Botswana produces less than Brazil overall, but it might be interesting to note that Botswana produced more per citizen than Brazil in 2014). Note that, in using these variables from the WDI data set, we have given over the responsibility of determining what corruption and quality of economy means to the World Bank. We have also 1

allowed dynamic concepts to become simple numbers. However, this allows us to move quickly to the later stages of analysis, which has its advantages. I. Understanding the basics In this section, we introduce the STATA interface and few basic but very helpful commands. the Stata Interface. When you start Stata, you will see the default Stata interface. The interface is made up of several small windows: command, results, variables, properties, and review. This view is customizable and can be altered. You can resize, close, or change the location of these windows using Edit/Preferences menu. Results Panel The large central panel is the Results Panel. Outputs of your analysis appear in this window. Only graphics will appear in a separate window. It can display only a limited number of lines at a time. Variables Panel The variables panel, on the top right, lists all variables in the active dataset. This includes the variable name, its label, and other information. Double click on a variable, and it will appear in the command window. 2

Properties Panel The Properties panel immediately below the variables panel displays basic features of your variables and dataset including variable type, number of observations, etc. Review Panel The Review panel logs all commands (from the command window) as they are entered. This allows you to keep track of every command you have run and run a command again if necessary. Click on an old command from the Review panel, and it will appear again in the command window. Command Panel The window labeled Command is where you type the syntax to run. One way to execute Stata commands is to type them directly into the command window and hit Enter. Do-File We recommend that you create a do file before you start working in STATA. A Do-file allows you to record and save all of your command lines so that you can repeat those steps in the future. Do-file Editor, New Do File Save Do File to the desktop or a USB drive. II. Managing Data In this section, we cover opening and saving data and creating and labeling new variables. Opening Data File, Open/Import Example path: C:\Users\<username>\Desktop\Workshop Data\cs.dta You may save copy of data as cs.dta in to practice with later. Copy command line into Do-file III. Descriptive Statistics In this section, we will demonstrate how to obtain information on descriptive statistics for one or more variables. A full description of all variables in the data can be obtained with the describe command: describe To generate a table of basic descriptive statistics, such as the number of observations, mean, standard deviation, minimum, and maximum use the command 3

sum corrup gdppc STATA displays each variable name, along with its descriptive statistics in table form. To insert these results into another document, highlight the table, right-click, and select any of copy options. The results can now be pasted into another document. Other useful descriptive commands: codebook, tabulate or tab IV. Recode and Generate a new variable Often, we are interested how the percentage change in one variable affected another variable. In those cases, we usually take the log of that variable. As an example, we will generate the log of Gross Domestic Product per capita (gdppc). gen log_gdppc=log(gdppc) Making a frequency table of a continuous variable is not very useful. If we want a good frequency table, we should recode. Look at the description for the variable (corrup) you want to recode. tab corrup codebook corrup Because of the way corruption is measured we want to reverse it, so that higher values indicate more corruption. gen corrup2 = 10 - corrup We will recode this variable into three categories. Additionally, we will generate a new variable for the recoded data and provide labels for the new values. recode corrup2 (0/3.3 = 1 "Low") (3.31/6.66 = 2 "Medium") (6.67/10 = 3 "High"), gen (corrup_categ) tab corrup_categ We can also assign a label to the variable itself to help us remember what it is. label variable corrup_categ "Corruption Index in 3 categories" tab corrup_categ IV. Creating Basic Graph In this section, we demonstrate how to generate a histogram and a scatter plot. A histogram is used to show the frequency distribution of a variable. Let s generate a histogram using the variables we just created 4

hist gdppc hist gdppc, freq title ("GDP per Capita Frequency") note ("Source: World Bank") In addition to the histogram, we also titled the graph and noted the data source. Suppose you are curious about the relationship between GDP per capita (gdppc) and the Corruption Index of a country (corrup2). Let s create a scatter plot to help visualize the relationship between these two variables. We will also include a trend line and title the graph to help us understand what we are seeing. twoway (scatter gdppc corrup2) twoway (scatter gdppc corrup2) (lfit gdppc corrup2) title( GDP per captia and Corruption Relationship ) note("source: World Bank") As we can see, there is a negative relationship between these two variables. We also might want to look at the average GDP per capita for each of our categorical levels of corruption. For this, we can use a bar graph. graph bar (mean) gdppc, over(corrupcat) V. Bivariate Analysis T-test A t-test is the most basic way to compare two groups together, based on their scores on a single variable. While we divided the Corruption Index (corrupt_categ) into three groups, you can only compare two of those groups at a time using a t-test. Let s compare the medium and high corruption countries (group 2 and 3) ttest gdppc if inlist(corrup_categ,2,3), by(corrup_categ) In the Output, look for the t, which gives the value of the t-test, degrees of freedom, and the p-value or Pr(T,t), which tells you the probability of getting that result. Remember that, in general, you are looking for a probability value less than.05. Correlation The correlation coefficient is a measure of association between two variables. The sign indicates inverse versus directly proportional relationships. 0 would be a nonrelationship, while a value of 1 would be a perfect positive relationship, and -1 would be a perfect inverse relationship. Let s look at the correlation between GDP per capita and Corruption Index pwcorr gdppc corrup2 Your output will be a 2x2 table, but only one cell is relevant. Use either cell that shows the relation between your two variables. VI. Running Linear Regressions 5

In this section, we show you how to run a simple regression in order to understand the relationship between GDP per capita and Corruption Index. regress gdppc corrup2 STATA displays the regression results in table form. The important information provided is the Adj R-squared, coef. and t/p> t. VI. A few tricks that will make you more efficient Press Page Up to retrieve your last command; press it more than once to retrieve earlier commands Click on a command in the Review window and it will be pasted into the Command window for editing. Double click on a command and it will be executed again Click on a variable name in the Variables window and it will be pasted into the Command window at the current location of the cursor Press q or click on the circled-x button at the top to interrupt a command in progress (the button turns red when something is running) Use the Properties window to learn about your data set, the individual variables it contains, and how much memory Stata is using. Help Command We can use the help command to search for help with specific commands. For example, to search for help about the summarize command, type (in the command window): help summarize 6