Stata versions 12 & 13 Week 4 Practice Problems

Similar documents
Stata versions 12 & 13 Week 4 - Practice Problems

Stata version 12. Lab Session 1 February Preliminary: How to Screen Capture.. 2. Preliminary: How to Keep a Log of Your Stata Session..

Stata version 14 Also works for versions 13 & 12. Lab Session 1 February Preliminary: How to Screen Capture..

Stata v 12 Illustration. First Session

I Launching and Exiting Stata. Stata will ask you if you would like to check for updates. Update now or later, your choice.

Introduction to Stata Toy Program #1 Basic Descriptives

Stata version 13. First Session. January I- Launching and Exiting Stata Launching Stata Exiting Stata..

1. Creating a data set using the data editor 2. Importing an Excel data file

Introduction to Stata First Session. I- Launching and Exiting Stata Launching Stata Exiting Stata..

Introduction to Stata Getting Data into Stata. 1. Enter Data: Create a New Data Set in Stata...

Bivariate (Simple) Regression Analysis

range: [1,20] units: 1 unique values: 20 missing.: 0/20 percentiles: 10% 25% 50% 75% 90%

Soci Statistics for Sociologists

texdoc 2.0 An update on creating LaTeX documents from within Stata Example 2

Creating LaTeX and HTML documents from within Stata using texdoc and webdoc. Example 2

Week 4: Simple Linear Regression III

/23/2004 TA : Jiyoon Kim. Recitation Note 1

Introduction to STATA 6.0 ECONOMICS 626

International Graduate School of Genetic and Molecular Epidemiology (GAME) Computing Notes and Introduction to Stata

THE LINEAR PROBABILITY MODEL: USING LEAST SQUARES TO ESTIMATE A REGRESSION EQUATION WITH A DICHOTOMOUS DEPENDENT VARIABLE

schooling.log 7/5/2006

Week 4: Simple Linear Regression II

8. MINITAB COMMANDS WEEK-BY-WEEK

STATA 13 INTRODUCTION

Week 5: Multiple Linear Regression II

PubHlth 640 Intermediate Biostatistics Unit 2 - Regression and Correlation. Simple Linear Regression Software: Stata v 10.1

Intermediate SAS: Statistics

An Introductory Guide to Stata

Stata Session 2. Tarjei Havnes. University of Oslo. Statistics Norway. ECON 4136, UiO, 2012

Introduction to Stata: An In-class Tutorial

. predict mod1. graph mod1 ed, connect(l) xlabel ylabel l1(model1 predicted income) b1(years of education)

Linear regression Number of obs = 6,866 F(16, 326) = Prob > F = R-squared = Root MSE =

STATA Note 5. One sample binomial data Confidence interval for proportion Unpaired binomial data: 2 x 2 tables Paired binomial data

1. Basic Steps for Data Analysis Data Editor. 2.4.To create a new SPSS file

1 Introducing Stata sample session

Empirical Asset Pricing

Week 11: Interpretation plus

Heteroskedasticity and Homoskedasticity, and Homoskedasticity-Only Standard Errors

Panel Data 4: Fixed Effects vs Random Effects Models

ECON Stata course, 3rd session

Week 10: Heteroskedasticity II

Introduction to Stata Session 3

Unit 1 Review of BIOSTATS 540 Practice Problems SOLUTIONS - Stata Users

ECON Introductory Econometrics Seminar 4

Brief Guide on Using SPSS 10.0

SOCY7706: Longitudinal Data Analysis Instructor: Natasha Sarkisian. Panel Data Analysis: Fixed Effects Models

Selected Introductory Statistical and Data Manipulation Procedures. Gordon & Johnson 2002 Minitab version 13.

Results Based Financing for Health Impact Evaluation Workshop Tunis, Tunisia October Stata 2. Willa Friedman

TYPES OF VARIABLES, STRUCTURE OF DATASETS, AND BASIC STATA LAYOUT

Data Analysis using SPSS

25 Working with categorical data and factor variables

BIOSTAT640 R Lab1 for Spring 2016

May 24, Emil Coman 1 Yinghui Duan 2 Daren Anderson 3

CDAA No. 4 - Part Two - Multiple Regression - Initial Data Screening

Excel 2010 with XLSTAT

Lab 2: OLS regression

The Power and Sample Size Application

Laboratory for Two-Way ANOVA: Interactions

Introduction to StatsDirect, 15/03/2017 1

Stat 500 lab notes c Philip M. Dixon, Week 10: Autocorrelated errors

ST512. Fall Quarter, Exam 1. Directions: Answer questions as directed. Please show work. For true/false questions, circle either true or false.

Centering and Interactions: The Training Data

GETTING STARTED WITH STATA FOR MAC R RELEASE 13

Introduction to Hierarchical Linear Model. Hsueh-Sheng Wu CFDR Workshop Series January 30, 2017

Reproducible Research: Weaving with Stata

Source:

Health Disparities (HD): It s just about comparing two groups

Lecture 3: The basic of programming- do file and macro

Stat 302 Statistical Software and Its Applications SAS: Data I/O

Correctly Compute Complex Samples Statistics

Repeated Measures Part 4: Blood Flow data

Subset Selection in Multiple Regression

Baruch College STA Senem Acet Coskun

Stata Training. AGRODEP Technical Note 08. April Manuel Barron and Pia Basurto

MINITAB 17 BASICS REFERENCE GUIDE

Robust Linear Regression (Passing- Bablok Median-Slope)

Cluster Randomization Create Cluster Means Dataset

CH5: CORR & SIMPLE LINEAR REFRESSION =======================================

Dr. Barbara Morgan Quantitative Methods

GETTING STARTED WITH STATA FOR WINDOWS R RELEASE 15

Econ Stata Tutorial I: Reading, Organizing and Describing Data. Sanjaya DeSilva

Bluman & Mayer, Elementary Statistics, A Step by Step Approach, Canadian Edition

BIOSTATS 640 Spring 2018 Introduction to R Data Description. 1. Start of Session. a. Preliminaries... b. Install Packages c. Attach Packages...

Doctoral Program in Epidemiology for Clinicians, April 2001 Computing notes

Week 9: Modeling II. Marcelo Coca Perraillon. Health Services Research Methods I HSMP University of Colorado Anschutz Medical Campus

Unit 5 Logistic Regression Practice Problems

Minitab 17 commands Prepared by Jeffrey S. Simonoff

Introduction to Programming in Stata

Stat 302 Statistical Software and Its Applications SAS: Data I/O & Descriptive Statistics

Introduction to Statistical Analyses in SAS

Summarising Data. Mark Lunt 09/10/2018. Arthritis Research UK Epidemiology Unit University of Manchester

Regression Lab 1. The data set cholesterol.txt available on your thumb drive contains the following variables:

Factorial ANOVA. Skipping... Page 1 of 18

Page 1. Notes: MB allocated to data 2. Stata running in batch mode. . do 2-simpower-varests.do. . capture log close. .

THE UNIVERSITY OF BRITISH COLUMBIA FORESTRY 430 and 533. Time: 50 minutes 40 Marks FRST Marks FRST 533 (extra questions)

Introduction to Minitab 1

An introduction to SPSS

optimization_machine_probit_bush106.c

Box-Cox Transformation for Simple Linear Regression

STATA Version 9 10/05/2012 1

Transcription:

Stata versions 12 & 13 Week 4 Practice Problems SOLUTIONS 1 Practice Screen Capture a Create a word document Name it using the convention lastname_lab1docx (eg bigelow_lab1docx) b Using your browser, go to the welcome page for PubHlth 640 c From there, navigate to the assignments page d Capture the picture of the ostrich e Paste the picture into lastname_lab1doc, inserting it into a table with 1 row and 1 column 2 Launch Stata and Start a Log of Your Session IMPORTANT Use extension log not scml a Launch Stata b Start a log of your session, with extension log Name it lastname_log1log (eg bigelow_log1log) c In the command window, type: set more off Sol_statlab1docx revised 3/2/2014 Page 1 of 10

3 Create a graph Save it Paste it into your word document a Launch Stata b In the command window, type: use http://wwwpauldickmancom/survival/ivfdta, clear c In the command window, type: histogram hyp, discrete d Save this as a png graph, with name hypertension_barpng, to your desktop e Paste your graph into lastname_lab1doc, again inserting it into a table with 1 row and 1 column Sol_statlab1docx revised 3/2/2014 Page 2 of 10

4 Create a new data set in Stata using Data Editor a Execute the following commands to create a Stata data set with the following 4 observations: id type: numeric dob type: date gender type: string/character 1 3/26/1926 male 1613 2 6/9/1956 female 1201 3 4/1/1954 male 2232 4 11/4/1951 female 1240 weight type: numeric * STEP 1: Clear the current data from memory clear * STEP 2: Define variables (lower case recommended) Set type Initialize to missing generate id= generate str8 dob_string="" generate str8 gender="" generate weight= * STEP 3: Click on DATA EDITOR icon to access an initially empty spreadsheet * Enter the data Then close the data editor window * STEP 4: Create a DATE variable called dob (date of birth) Drop string variable generate dob=date(dob_string, "MDY") format dob %tdnn/dd/ccyy drop dob_string * STEP 5: Create 0/1 indicator of female gender generate female=(gender=="female") * STEP 6: Label variables label variable id "Subject id" label variable weight "weight (lbs)" label variable dob "Date of birth" label variable female "0/1 female" * STEP 7: Create discrete variable value labels (the dictionary) label define femalef 0 "male" 1 "female" * STEP 8: Attach labels to discrete variable values label values female femalef list * Produce listing of data list * Save data set using FILE > SAVE AS Sol_statlab1docx revised 3/2/2014 Page 3 of 10

b Paste your listing of data into lastname_lab1doc, again inserting it into a table with 1 row and 1 column +--------------------------------------------+ id gender weight dob female -------------------------------------------- 1 1 male 1613 03/26/1926 male 2 2 female 1201 06/09/1956 female 3 3 male 2232 04/01/1954 male 4 4 female 124 11/04/1951 female +--------------------------------------------+ 5 Numerical Descriptives a Execute the following commands to produce the numerical descriptives indicated b Paste this portion of your stata log into lastname_lab1doc, again inserting it into a 1x1 table clear use "http://wwwpauldickmancom/survival/ivfdta", clear (In Vitro Fertilization data) sort sex tabstat bweight, by(sex) col(stat) stat(n mean sd sem min q max) Summary for variables: bweight by categories of: sex (sex of the baby) sex N mean sd se(mean) min p25 p50 p75 max -------+------------ male 326 3211279 6659798 3688521 700 2900 3290 3610 4650 female 315 3044127 6286603 35421 630 2800 3120 3400 4416 -------+------------ Total 641 3129137 6527827 2578336 630 2850 3200 3550 4650 -------------------- Sol_statlab1docx revised 3/2/2014 Page 4 of 10

6 One and Two Sample Inference a Execute the following commands to produce the standard one and two sample tests b Paste this portion of your stata log into lastname_lab1doc, again inserting it into a 1x1 table ** ONE CONTINUOUS VARIABLE 99% CI for mean using command ci ci gestwks, level(99) Variable Obs Mean Std Err [99% Conf Interval] -------------+--------------------------------------------------------------- gestwks 641 3868725 0920267 384495 3892501 ** ONE CONTNUOUS VARIABLE - Test of null: mean=40 using command ttest ttest gestwks=40 One-sample t test Variable Obs Mean Std Err Std Dev [95% Conf Interval] gestwks 641 3868725 0920267 2329931 3850654 3886797 mean = mean(gestwks) t = -142648 Ho: mean = 40 degrees of freedom = 640 Ha: mean < 40 Ha: mean!= 40 Ha: mean > 40 Pr(T < t) = 00000 Pr( T > t ) = 00000 Pr(T > t) = 10000 ** ONE CONTINUOUS VARIABLE - Test of null: standard deviation = 1 using command sdtest sdtest gestwks=1 One-sample test of variance Variable Obs Mean Std Err Std Dev [95% Conf Interval] gestwks 641 3868725 0920267 2329931 3850654 3886797 sd = sd(gestwks) c = chi2 = 35e+03 Ho: sd = 1 degrees of freedom = 640 Ha: sd < 1 Ha: sd!= 1 Ha: sd > 1 Pr(C < c) = 10000 2*Pr(C > c) = 00000 Pr(C > c) = 00000 Sol_statlab1docx revised 3/2/2014 Page 5 of 10

** TWO CONTINUOUS VARIABLES - Test of null: Equality of 2 INDEPENDENT means using ttest sort sex ttest gestwks, by(sex) Two-sample t test with equal variances Group Obs Mean Std Err Std Dev [95% Conf Interval] male 326 3870021 1224176 2210307 3845938 3894105 female 315 3867384 1381012 2451053 3840212 3894556 combined 641 3868725 0920267 2329931 3850654 3886797 diff 0263734 1842216-3353795 3881264 diff = mean(male) - mean(female) t = 01432 Ho: diff = 0 degrees of freedom = 639 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = 05569 Pr( T > t ) = 08862 Pr(T > t) = 04431 ttest gestwks, by(sex) unequal Two-sample t test with unequal variances Group Obs Mean Std Err Std Dev [95% Conf Interval] male 326 3870021 1224176 2210307 3845938 3894105 female 315 3867384 1381012 2451053 3840212 3894556 combined 641 3868725 0920267 2329931 3850654 3886797 diff 0263734 1845481-3360336 3887804 diff = mean(male) - mean(female) t = 01429 Ho: diff = 0 Satterthwaite's degrees of freedom = 627193 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = 05568 Pr( T > t ) = 08864 Pr(T > t) = 04432 ** TWO CONTINUOUS VARIABLES - Test of equality of 2 independent variances using sdtest sdtest gestwks, by(sex) Variance ratio test Group Obs Mean Std Err Std Dev [95% Conf Interval] male 326 3870021 1224176 2210307 3845938 3894105 female 315 3867384 1381012 2451053 3840212 3894556 combined 641 3868725 0920267 2329931 3850654 3886797 ratio = sd(male) / sd(female) f = 08132 Ho: ratio = 1 degrees of freedom = 325, 314 Ha: ratio < 1 Ha: ratio!= 1 Ha: ratio > 1 Pr(F < f) = 00324 2*Pr(F < f) = 00648 Pr(F > f) = 09676 Sol_statlab1docx revised 3/2/2014 Page 6 of 10

** 1 DISCRETE (0/1) VARIABLE - Test of binomial proportion using bitest and prtest ** Test of null: proportion of female births = 50 generate female=(sex==2) bitest female=50 Variable N Observed k Expected k Assumed p Observed p -------------+------------------------------------------------------------ female 641 315 3205 050000 049142 Pr(k >= 315) = 0682223 (one-sided test) Pr(k <= 315) = 0346446 (one-sided test) Pr(k <= 315 or k >= 326) = 0692892 (two-sided test) prtest female=50 One-sample test of proportion female: Number of obs = 641 Variable Mean Std Err [95% Conf Interval] female 4914197 0197459 4527184 5301209 p = proportion(female) z = -04345 Ho: p = 05 Ha: p < 05 Ha: p!= 05 Ha: p > 05 Pr(Z < z) = 03320 Pr( Z > z ) = 06639 Pr(Z > z) = 06680 ** 1 DISCRETE (0/1) VARIABLE - 95% CI for event probability using ci & option binomial ci female, binomial level(95) -- Binomial Exact -- Variable Obs Mean Std Err [95% Conf Interval] -------------+--------------------------------------------------------------- female 641 4914197 0197459 4520545 5308641 ** 2 DISCRETE (0/1) VARIABLES - Test of equality of probabilities using prtest sort sex prtest hyp, by(sex) Two-sample test of proportions male: Number of obs = 325 female: Number of obs = 314 Variable Mean Std Err z P> z [95% Conf Interval] male 16 0203356 1201429 1998571 female 1178344 0181948 0821733 1534955 diff 0421656 0272871-0113162 0956474 under Ho: 027398 154 0124 diff = prop(male) - prop(female) z = 15390 Ho: diff = 0 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(Z < z) = 09381 Pr( Z < z ) = 01238 Pr(Z > z) = 00619 Sol_statlab1docx revised 3/2/2014 Page 7 of 10

**2 DISCRETE VARIABLES - chi square test using tab2 with option chi2 tab2 sex hyp, row column chi2 -> tabulation of sex by hyp hypertension (1=yes, sex of the 0=no) baby 0 1 Total male 273 52 325 8400 1600 10000 4964 5843 5086 female 277 37 314 8822 1178 10000 5036 4157 4914 Total 550 89 639 8607 1393 10000 10000 10000 10000 Pearson chi2(1) = 23685 Pr = 0124 ** 2 DISCRETE VARIABLES - Fisher exact test using tab2 with option exact tab2 sex hyp, row column exact -> tabulation of sex by hyp hypertension (1=yes, sex of the 0=no) baby 0 1 Total male 273 52 325 8400 1600 10000 4964 5843 5086 female 277 37 314 8822 1178 10000 5036 4157 4914 Total 550 89 639 8607 1393 10000 10000 10000 10000 Fisher's exact = 0138 1-sided Fisher's exact = 0077 Sol_statlab1docx revised 3/2/2014 Page 8 of 10

7 Simple and Multiple Linear Regression a Execute the following commands to produce some regressions b Paste this portion of your stata log into lastname_lab1doc, again inserting it into a 1x1 table clear * Depending on your Stata purchase, import EITHER hersdatadta or hersdata100dta * Choice 1 of 2: hersdatadta for Stata/IC use "http://peopleumassedu/biep640w/datasets/hersdatadta", clear * Choice 2 of 2: hersdatadta for SMALL Stata use "http://peopleumassedu/biep640w/datasets/hersdata1000dta", clear ** SOLUTIONS shown here utilize the larger data set, hersdatadta ** One predictor - continuous regress glucose BMI Source SS df MS Number of obs = 2758 -------------+------------------------------ F( 1, 2756) = 22135 Model 278706235 1 278706235 Prob > F = 00000 Residual 347009232 2756 125910462 R-squared = 00743 -------------+------------------------------ Adj R-squared = 00740 Total 374879855 2757 135973832 Root MSE = 35484 glucose Coef Std Err t P> t [95% Conf Interval] BMI 1822176 1224751 1488 0000 1582024 2062328 _cons 6007405 3564864 1685 0000 5308398 6706413 ** One predictor - nominal physical activity with design variables xi: regress glucose iphysact iphysact _Iphysact_1-5 (naturally coded; _Iphysact_1 omitted) Source SS df MS Number of obs = 2763 -------------+------------------------------ F( 4, 2758) = 1651 Model 876964867 4 219241217 Prob > F = 00000 Residual 366276497 2758 132805111 R-squared = 00234 -------------+------------------------------ Adj R-squared = 00220 Total 375046146 2762 135787888 Root MSE = 36442 glucose Coef Std Err t P> t [95% Conf Interval] _Iphysact_2-7766619 3062946-254 0011-1377252 -1760719 _Iphysact_3-1043031 2861203-365 0000-1604063 -4819996 _Iphysact_4-174218 2885509-604 0000-2307978 -1176383 _Iphysact_5-2086474 3328876-627 0000-2739208 -1433739 _cons 1246294 2596416 4800 0000 1195383 1297206 Sol_statlab1docx revised 3/2/2014 Page 9 of 10

** Multiple predictor model with both BMI and phsyact xi: regress glucose BMI iphysact iphysact _Iphysact_1-5 (naturally coded; _Iphysact_1 omitted) Source SS df MS Number of obs = 2758 -------------+------------------------------ F( 5, 2752) = 4925 Model 307902483 5 615804966 Prob > F = 00000 Residual 344089607 2752 125032561 R-squared = 00821 -------------+------------------------------ Adj R-squared = 00805 Total 374879855 2757 135973832 Root MSE = 3536 glucose Coef Std Err t P> t [95% Conf Interval] BMI 1678593 1263181 1329 0000 1430905 192628 _Iphysact_2-6825454 2978971-229 0022-126667 -9842086 _Iphysact_3-727145 2792216-260 0009-127465 -17964 _Iphysact_4-1127201 2842855-397 0000-1684635 -5697664 _Iphysact_5-1346989 3281714-410 0000-1990477 -7035022 _cons 7275289 4645653 1566 0000 6364357 818622 ** Partial F test of BMI controlling for physact: 1 df Partial F testparm BMI ( 1) BMI = 0 F( 1, 2752) = 17659 Prob > F = 00000 ** Partial F test of physical activity controlling for BMI: 4 df Partial F testparm _Iphysact* ( 1) _Iphysact_2 = 0 ( 2) _Iphysact_3 = 0 ( 3) _Iphysact_4 = 0 ( 4) _Iphysact_5 = 0 F( 4, 2752) = 584 Prob > F = 00001 Sol_statlab1docx revised 3/2/2014 Page 10 of 10