Stata versions 12 & 13 Week 4 Practice Problems SOLUTIONS 1 Practice Screen Capture a Create a word document Name it using the convention lastname_lab1docx (eg bigelow_lab1docx) b Using your browser, go to the welcome page for PubHlth 640 c From there, navigate to the assignments page d Capture the picture of the ostrich e Paste the picture into lastname_lab1doc, inserting it into a table with 1 row and 1 column 2 Launch Stata and Start a Log of Your Session IMPORTANT Use extension log not scml a Launch Stata b Start a log of your session, with extension log Name it lastname_log1log (eg bigelow_log1log) c In the command window, type: set more off Sol_statlab1docx revised 3/2/2014 Page 1 of 10
3 Create a graph Save it Paste it into your word document a Launch Stata b In the command window, type: use http://wwwpauldickmancom/survival/ivfdta, clear c In the command window, type: histogram hyp, discrete d Save this as a png graph, with name hypertension_barpng, to your desktop e Paste your graph into lastname_lab1doc, again inserting it into a table with 1 row and 1 column Sol_statlab1docx revised 3/2/2014 Page 2 of 10
4 Create a new data set in Stata using Data Editor a Execute the following commands to create a Stata data set with the following 4 observations: id type: numeric dob type: date gender type: string/character 1 3/26/1926 male 1613 2 6/9/1956 female 1201 3 4/1/1954 male 2232 4 11/4/1951 female 1240 weight type: numeric * STEP 1: Clear the current data from memory clear * STEP 2: Define variables (lower case recommended) Set type Initialize to missing generate id= generate str8 dob_string="" generate str8 gender="" generate weight= * STEP 3: Click on DATA EDITOR icon to access an initially empty spreadsheet * Enter the data Then close the data editor window * STEP 4: Create a DATE variable called dob (date of birth) Drop string variable generate dob=date(dob_string, "MDY") format dob %tdnn/dd/ccyy drop dob_string * STEP 5: Create 0/1 indicator of female gender generate female=(gender=="female") * STEP 6: Label variables label variable id "Subject id" label variable weight "weight (lbs)" label variable dob "Date of birth" label variable female "0/1 female" * STEP 7: Create discrete variable value labels (the dictionary) label define femalef 0 "male" 1 "female" * STEP 8: Attach labels to discrete variable values label values female femalef list * Produce listing of data list * Save data set using FILE > SAVE AS Sol_statlab1docx revised 3/2/2014 Page 3 of 10
b Paste your listing of data into lastname_lab1doc, again inserting it into a table with 1 row and 1 column +--------------------------------------------+ id gender weight dob female -------------------------------------------- 1 1 male 1613 03/26/1926 male 2 2 female 1201 06/09/1956 female 3 3 male 2232 04/01/1954 male 4 4 female 124 11/04/1951 female +--------------------------------------------+ 5 Numerical Descriptives a Execute the following commands to produce the numerical descriptives indicated b Paste this portion of your stata log into lastname_lab1doc, again inserting it into a 1x1 table clear use "http://wwwpauldickmancom/survival/ivfdta", clear (In Vitro Fertilization data) sort sex tabstat bweight, by(sex) col(stat) stat(n mean sd sem min q max) Summary for variables: bweight by categories of: sex (sex of the baby) sex N mean sd se(mean) min p25 p50 p75 max -------+------------ male 326 3211279 6659798 3688521 700 2900 3290 3610 4650 female 315 3044127 6286603 35421 630 2800 3120 3400 4416 -------+------------ Total 641 3129137 6527827 2578336 630 2850 3200 3550 4650 -------------------- Sol_statlab1docx revised 3/2/2014 Page 4 of 10
6 One and Two Sample Inference a Execute the following commands to produce the standard one and two sample tests b Paste this portion of your stata log into lastname_lab1doc, again inserting it into a 1x1 table ** ONE CONTINUOUS VARIABLE 99% CI for mean using command ci ci gestwks, level(99) Variable Obs Mean Std Err [99% Conf Interval] -------------+--------------------------------------------------------------- gestwks 641 3868725 0920267 384495 3892501 ** ONE CONTNUOUS VARIABLE - Test of null: mean=40 using command ttest ttest gestwks=40 One-sample t test Variable Obs Mean Std Err Std Dev [95% Conf Interval] gestwks 641 3868725 0920267 2329931 3850654 3886797 mean = mean(gestwks) t = -142648 Ho: mean = 40 degrees of freedom = 640 Ha: mean < 40 Ha: mean!= 40 Ha: mean > 40 Pr(T < t) = 00000 Pr( T > t ) = 00000 Pr(T > t) = 10000 ** ONE CONTINUOUS VARIABLE - Test of null: standard deviation = 1 using command sdtest sdtest gestwks=1 One-sample test of variance Variable Obs Mean Std Err Std Dev [95% Conf Interval] gestwks 641 3868725 0920267 2329931 3850654 3886797 sd = sd(gestwks) c = chi2 = 35e+03 Ho: sd = 1 degrees of freedom = 640 Ha: sd < 1 Ha: sd!= 1 Ha: sd > 1 Pr(C < c) = 10000 2*Pr(C > c) = 00000 Pr(C > c) = 00000 Sol_statlab1docx revised 3/2/2014 Page 5 of 10
** TWO CONTINUOUS VARIABLES - Test of null: Equality of 2 INDEPENDENT means using ttest sort sex ttest gestwks, by(sex) Two-sample t test with equal variances Group Obs Mean Std Err Std Dev [95% Conf Interval] male 326 3870021 1224176 2210307 3845938 3894105 female 315 3867384 1381012 2451053 3840212 3894556 combined 641 3868725 0920267 2329931 3850654 3886797 diff 0263734 1842216-3353795 3881264 diff = mean(male) - mean(female) t = 01432 Ho: diff = 0 degrees of freedom = 639 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = 05569 Pr( T > t ) = 08862 Pr(T > t) = 04431 ttest gestwks, by(sex) unequal Two-sample t test with unequal variances Group Obs Mean Std Err Std Dev [95% Conf Interval] male 326 3870021 1224176 2210307 3845938 3894105 female 315 3867384 1381012 2451053 3840212 3894556 combined 641 3868725 0920267 2329931 3850654 3886797 diff 0263734 1845481-3360336 3887804 diff = mean(male) - mean(female) t = 01429 Ho: diff = 0 Satterthwaite's degrees of freedom = 627193 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(T < t) = 05568 Pr( T > t ) = 08864 Pr(T > t) = 04432 ** TWO CONTINUOUS VARIABLES - Test of equality of 2 independent variances using sdtest sdtest gestwks, by(sex) Variance ratio test Group Obs Mean Std Err Std Dev [95% Conf Interval] male 326 3870021 1224176 2210307 3845938 3894105 female 315 3867384 1381012 2451053 3840212 3894556 combined 641 3868725 0920267 2329931 3850654 3886797 ratio = sd(male) / sd(female) f = 08132 Ho: ratio = 1 degrees of freedom = 325, 314 Ha: ratio < 1 Ha: ratio!= 1 Ha: ratio > 1 Pr(F < f) = 00324 2*Pr(F < f) = 00648 Pr(F > f) = 09676 Sol_statlab1docx revised 3/2/2014 Page 6 of 10
** 1 DISCRETE (0/1) VARIABLE - Test of binomial proportion using bitest and prtest ** Test of null: proportion of female births = 50 generate female=(sex==2) bitest female=50 Variable N Observed k Expected k Assumed p Observed p -------------+------------------------------------------------------------ female 641 315 3205 050000 049142 Pr(k >= 315) = 0682223 (one-sided test) Pr(k <= 315) = 0346446 (one-sided test) Pr(k <= 315 or k >= 326) = 0692892 (two-sided test) prtest female=50 One-sample test of proportion female: Number of obs = 641 Variable Mean Std Err [95% Conf Interval] female 4914197 0197459 4527184 5301209 p = proportion(female) z = -04345 Ho: p = 05 Ha: p < 05 Ha: p!= 05 Ha: p > 05 Pr(Z < z) = 03320 Pr( Z > z ) = 06639 Pr(Z > z) = 06680 ** 1 DISCRETE (0/1) VARIABLE - 95% CI for event probability using ci & option binomial ci female, binomial level(95) -- Binomial Exact -- Variable Obs Mean Std Err [95% Conf Interval] -------------+--------------------------------------------------------------- female 641 4914197 0197459 4520545 5308641 ** 2 DISCRETE (0/1) VARIABLES - Test of equality of probabilities using prtest sort sex prtest hyp, by(sex) Two-sample test of proportions male: Number of obs = 325 female: Number of obs = 314 Variable Mean Std Err z P> z [95% Conf Interval] male 16 0203356 1201429 1998571 female 1178344 0181948 0821733 1534955 diff 0421656 0272871-0113162 0956474 under Ho: 027398 154 0124 diff = prop(male) - prop(female) z = 15390 Ho: diff = 0 Ha: diff < 0 Ha: diff!= 0 Ha: diff > 0 Pr(Z < z) = 09381 Pr( Z < z ) = 01238 Pr(Z > z) = 00619 Sol_statlab1docx revised 3/2/2014 Page 7 of 10
**2 DISCRETE VARIABLES - chi square test using tab2 with option chi2 tab2 sex hyp, row column chi2 -> tabulation of sex by hyp hypertension (1=yes, sex of the 0=no) baby 0 1 Total male 273 52 325 8400 1600 10000 4964 5843 5086 female 277 37 314 8822 1178 10000 5036 4157 4914 Total 550 89 639 8607 1393 10000 10000 10000 10000 Pearson chi2(1) = 23685 Pr = 0124 ** 2 DISCRETE VARIABLES - Fisher exact test using tab2 with option exact tab2 sex hyp, row column exact -> tabulation of sex by hyp hypertension (1=yes, sex of the 0=no) baby 0 1 Total male 273 52 325 8400 1600 10000 4964 5843 5086 female 277 37 314 8822 1178 10000 5036 4157 4914 Total 550 89 639 8607 1393 10000 10000 10000 10000 Fisher's exact = 0138 1-sided Fisher's exact = 0077 Sol_statlab1docx revised 3/2/2014 Page 8 of 10
7 Simple and Multiple Linear Regression a Execute the following commands to produce some regressions b Paste this portion of your stata log into lastname_lab1doc, again inserting it into a 1x1 table clear * Depending on your Stata purchase, import EITHER hersdatadta or hersdata100dta * Choice 1 of 2: hersdatadta for Stata/IC use "http://peopleumassedu/biep640w/datasets/hersdatadta", clear * Choice 2 of 2: hersdatadta for SMALL Stata use "http://peopleumassedu/biep640w/datasets/hersdata1000dta", clear ** SOLUTIONS shown here utilize the larger data set, hersdatadta ** One predictor - continuous regress glucose BMI Source SS df MS Number of obs = 2758 -------------+------------------------------ F( 1, 2756) = 22135 Model 278706235 1 278706235 Prob > F = 00000 Residual 347009232 2756 125910462 R-squared = 00743 -------------+------------------------------ Adj R-squared = 00740 Total 374879855 2757 135973832 Root MSE = 35484 glucose Coef Std Err t P> t [95% Conf Interval] BMI 1822176 1224751 1488 0000 1582024 2062328 _cons 6007405 3564864 1685 0000 5308398 6706413 ** One predictor - nominal physical activity with design variables xi: regress glucose iphysact iphysact _Iphysact_1-5 (naturally coded; _Iphysact_1 omitted) Source SS df MS Number of obs = 2763 -------------+------------------------------ F( 4, 2758) = 1651 Model 876964867 4 219241217 Prob > F = 00000 Residual 366276497 2758 132805111 R-squared = 00234 -------------+------------------------------ Adj R-squared = 00220 Total 375046146 2762 135787888 Root MSE = 36442 glucose Coef Std Err t P> t [95% Conf Interval] _Iphysact_2-7766619 3062946-254 0011-1377252 -1760719 _Iphysact_3-1043031 2861203-365 0000-1604063 -4819996 _Iphysact_4-174218 2885509-604 0000-2307978 -1176383 _Iphysact_5-2086474 3328876-627 0000-2739208 -1433739 _cons 1246294 2596416 4800 0000 1195383 1297206 Sol_statlab1docx revised 3/2/2014 Page 9 of 10
** Multiple predictor model with both BMI and phsyact xi: regress glucose BMI iphysact iphysact _Iphysact_1-5 (naturally coded; _Iphysact_1 omitted) Source SS df MS Number of obs = 2758 -------------+------------------------------ F( 5, 2752) = 4925 Model 307902483 5 615804966 Prob > F = 00000 Residual 344089607 2752 125032561 R-squared = 00821 -------------+------------------------------ Adj R-squared = 00805 Total 374879855 2757 135973832 Root MSE = 3536 glucose Coef Std Err t P> t [95% Conf Interval] BMI 1678593 1263181 1329 0000 1430905 192628 _Iphysact_2-6825454 2978971-229 0022-126667 -9842086 _Iphysact_3-727145 2792216-260 0009-127465 -17964 _Iphysact_4-1127201 2842855-397 0000-1684635 -5697664 _Iphysact_5-1346989 3281714-410 0000-1990477 -7035022 _cons 7275289 4645653 1566 0000 6364357 818622 ** Partial F test of BMI controlling for physact: 1 df Partial F testparm BMI ( 1) BMI = 0 F( 1, 2752) = 17659 Prob > F = 00000 ** Partial F test of physical activity controlling for BMI: 4 df Partial F testparm _Iphysact* ( 1) _Iphysact_2 = 0 ( 2) _Iphysact_3 = 0 ( 3) _Iphysact_4 = 0 ( 4) _Iphysact_5 = 0 F( 4, 2752) = 584 Prob > F = 00001 Sol_statlab1docx revised 3/2/2014 Page 10 of 10