Regression III: Lab 4

Size: px

Start display at page:

Download "Regression III: Lab 4"

Irene Lynch
5 years ago
Views:

1 Regression III: Lab 4 This lab will work through some model/variable selection problems, finite mixture models and missing data issues. You shouldn t feel obligated to work through this linearly, I would rather suggest that you try to work with topics your most interested in first. Model Selection In class we looked at some data we used in the mixture model on democracy. Using the dataset indicated below, test two different models of liberal-conservative identification. > library(foreign) > dat <- read.dta(' > dat$libcpre_self_num <- as.numeric(dat$libcpre_self) > dat$dem_agegrp_num <- as.numeric(dat$dem_agegrp) Use the methods we discussed in class to adjudicate between the two following models. Model 1 Estimate a linear model where you predict libcpre_self_num with indsocial, relig_chmember, dem_edugroup, dem_agegrp_num and gender_respondent. The model here represents social determinants of self-identification along with some controls. Model 2 Estimate a linear model where you predict libcpre_self_num with indspend, inc_incgroup_pre, dem_edugroup, dem_agegrp_num and gender_respondent. This model represents a more economically driven identification > m1 <- lm(libcpre_self_num ~ indsocial + relig_chmember + dem_edugroup + + dem_agegrp_num + gender_respondent, data=na.omit(dat)) > m2 <- lm(libcpre_self_num ~ indspend + inc_incgroup_pre + dem_edugroup + + dem_agegrp_num + gender_respondent, data=na.omit(dat)) Now you can do the testing. They could do whatever, but I ll look at AIC, BIC, Vuong and Clarke: > AIC(m1) [1] > AIC(m2) [1] > BIC(m1) [1] > BIC(m2) [1] > library(games) > clarke(m1, m2) Clarke test for non-nested models Model 1 log-likelihood: Model 2 log-likelihood: Observations: 2007 Test statistic: 1384 (69%) Model 1 is preferred (p < 2e-16) Now, use model averaging to evaluate the relative importance of the social and spending variables and to get model coefficients that incorporate selection uncertainty. > library(mumin) > mods <- list(m1, m2) > summary(model.avg(m1, m2)) 1

2 Call: model.avg(object = m1, m2) Component model call: lm(formula = <2 unique values>, data = na.omit(dat)) Component models: df loglik AICc delta weight Term codes: dem_agegrp_num dem_edugroup gender_respondent inc_incgroup_pre indsocial indspend relig_chmember Model-averaged coefficients: (full average) Estimate Std. Error Adjusted SE z value Pr(> z ) (Intercept) 4.120e e e < 2e-16 indsocial 1.212e e e < 2e-16 relig_chmemberno e e e e-05 dem_edugrouphs 1.492e e e dem_edugrouphs but no BA/S 1.588e e e dem_edugroupba/s 1.374e e e dem_edugroupgrad Deg e e e dem_agegrp_num 4.392e e e e-07 gender_respondent e e e indspend 2.926e e e inc_incgroup_pre 4.251e e e (Intercept) *** indsocial *** relig_chmemberno *** dem_edugrouphs dem_edugrouphs but no BA/S dem_edugroupba/s dem_edugroupgrad Deg dem_agegrp_num *** gender_respondent ** indspend inc_incgroup_pre (conditional average) Estimate Std. Error Adjusted SE z value Pr(> z ) (Intercept) < 2e-16 indsocial < 2e-16 relig_chmemberno e-05 dem_edugrouphs dem_edugrouphs but no BA/S dem_edugroupba/s dem_edugroupgrad Deg dem_agegrp_num e-07 gender_respondent indspend < 2e-16 inc_incgroup_pre (Intercept) *** indsocial *** relig_chmemberno *** dem_edugrouphs dem_edugrouphs but no BA/S dem_edugroupba/s dem_edugroupgrad Deg dem_agegrp_num *** gender_respondent ** indspend *** inc_incgroup_pre --- Signif. codes: 0 ^aăÿ***^aăź ^aăÿ**^aăź 0.01 ^aăÿ*^aăź 0.05 ^aăÿ.^aăź 0.1 ^aăÿ ^aăź 1 Relative variable importance: dem_agegrp_num dem_edugroup gender_respondent indsocial Importance: N containing models: relig_chmember inc_incgroup_pre indspend Importance: 1 <0.01 <0.01 N containing models:

3 Finite Mixture Models Using the data above, estimate a finite mixture model where you assume that indsocial and indspend operationalize the two different theories (don t use relig_chmember and inc_incgroup_pre in the models, yet.) Evaluate the resulting model against the linear model where there is an interaction between indsocial and indspend including the other controls. Run an OLS regression of libcpre_self_num on indsocial (a composite of social policy attitudes) and indspend (a composite of spending policy attitudes), their interaction and the controls gender_respondent, dem_edugroup and dem_agegrp_num. > library(flexmix) > dat <- read.dta(' > dat$libcpre_self_num <- as.numeric(dat$libcpre_self) > dat$dem_agegrp_num <- as.numeric(dat$dem_agegrp) > mod <- lm(libcpre_self_num ~ indsocial*indspend + gender_respondent + dem_edugroup + dem_agegrp_n > library(damisc) > DAintfun2(mod, c("indsocial", "indspend"), hist=t, scale.hist=.3) Conditional Effect of INDSOCIAL INDSPEND Conditional Effect of INDSPEND INDSOCIAL INDSPEND INDSOCIAL Estimate a finite mixture model where indsocial is the variable of interest in one component and indspend is the variable of interest in the other. That is set the indspend coefficient to zero in the model with indsocial and indsocial s coefficient to zero in the model with indspend. Fix the coefficients on the other regressors to be constant across the two components. 3

4 > model <- FLXMRglmfix(family = "gaussian", nested=list(k=c(1,1), + formula = c(~indsocial, + ~indspend)), fixed= ~ dem_agegrp_num + gender_respondent + dem_edugroup) > out.a <- stepflexmix(libcpre_self_num ~ 1, k=2, model=model, data=dat, nrep=20) 2 : * * * * * * * * * * * * * * * * * * * * > mod.refit <- refit(out.a) How do the two different models fit? Which do you think it better? > fit1 <- predict(out.a)$comp.1[[1]] > fit2 <- predict(out.a)$comp.2[[1]] > post <- out.a@posterior$scaled > predfix1 <- rowsums(cbind(fit1, fit2)*post) > ## Correlations of fitted values and observed values from the mixture model > cor(predfix1, dat$libcpre_self)^2 [1] > ## Correlation using only the fits from best predicted theory > predfix2 <- fit1 > predfix2[which(post[,2] >.5)] <- fit2[which(post[,2] >.5)] > cor(predfix2, dat$libcpre_self)^2 [,1] [1,] > ## correlations of fitted values and observed from the linear model > cor(mod$fitted, dat$libcpre_self)^2 [1] Consider that income (an economic predictor operationalized by inc_incgroup_pre) and church membership (a social predictor operationalized by relig_chmember) might tell us something about the probabilities of being in one or the other group. Incorporate that information and see whether, in fact, they do provide information about group membership. > out.b <- stepflexmix(libcpre_self ~ 1, k=2, model=model, data=dat, nrep=20, + concomitant = FLXPmultinom( ~ inc_incgroup_pre + relig_chmember)) 2 : * * * * * * * * * * * * * * * * * * * * > out.b.refit <- refit(out.b) > out.b.refit@concomitant $Comp.2 Estimate Std. Error z value Pr(> z ) (Intercept) * inc_incgroup_pre e-07 *** relig_chmember2. No Signif. codes: 0 ^aăÿ***^aăź ^aăÿ**^aăź 0.01 ^aăÿ*^aăź 0.05 ^aăÿ.^aăź 0.1 ^aăÿ ^aăź 1 4

5 Missing Data and Multiple Imputation Using lab3b_data.dta, do multiple imputation (with 5 imputations) on all of the variables in the dataset. You can download the data with: > library(foreign) > dat <- read.dta(' > dat$libcpre_self <- as.numeric(dat$libcpre_self) > dat$dem_agegrp <- as.numeric(dat$dem_agegrp) See how the coefficients change in a model of libcpre_self on all the other variables in the dataset from listwise deletion to multiple imputation. > library(mice) > library(mitools) > mice.out <- mice(dat, print.flag=f) iter imp variable 1 1 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 1 2 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 1 3 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 1 4 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 1 5 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 2 1 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 2 2 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 2 3 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 2 4 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 2 5 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 3 1 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 3 2 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 3 3 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 3 4 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 3 5 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 4 1 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 4 2 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 4 3 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 4 4 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 4 5 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 5 1 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 5 2 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 5 3 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 5 4 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 5 5 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember > mod.lm <- lm(libcpre_self ~ indsocial + indspend + dem_agegrp + + gender_respondent + dem_edugroup + inc_incgroup_pre + relig_chmember, data=dat) > mod.mids <- lm.mids(libcpre_self ~ indsocial + indspend + dem_agegrp + + gender_respondent + dem_edugroup + inc_incgroup_pre + relig_chmember, data=mice.out) > summary(mod.lm) Call: lm(formula = libcpre_self ~ indsocial + indspend + dem_agegrp + gender_respondent + dem_edugroup + inc_incgroup_pre + relig_chmember, data = dat) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** indsocial < 2e-16 *** indspend e-15 *** dem_agegrp e-06 *** gender_respondent * dem_edugrouphs dem_edugrouphs but no BA/S dem_edugroupba/s dem_edugroupgrad Deg * inc_incgroup_pre relig_chmemberno e-06 *** --- Signif. codes: 0 ^aăÿ***^aăź ^aăÿ**^aăź 0.01 ^aăÿ*^aăź 0.05 ^aăÿ.^aăź 0.1 ^aăÿ ^aăź 1 Residual standard error: on 1996 degrees of freedom (3909 observations deleted due to missingness) Multiple R-squared: , Adjusted R-squared: F-statistic: on 10 and 1996 DF, p-value: < 2.2e-16 > summary(pool(mod.mids)) est se t df Pr(> t ) (Intercept) e+00 indsocial e+00 indspend e+00 dem_agegrp e-09 gender_respondent e-02 dem_edugroup e-02 dem_edugroup e-03 dem_edugroup e-03 dem_edugroup e-01 inc_incgroup_pre e-01 relig_chmember e-07 lo 95 hi 95 nmis fmi lambda (Intercept) NA indsocial indspend

6 dem_agegrp gender_respondent dem_edugroup NA dem_edugroup NA dem_edugroup NA dem_edugroup NA inc_incgroup_pre relig_chmember NA How do the results change if you have use 10 imputations instead of 5? > mice.out2 <- mice(dat, m=10, print.flag=f) iter imp variable 1 1 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 1 2 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 1 3 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 1 4 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 1 5 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 1 6 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 1 7 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 1 8 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 1 9 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 1 10 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 2 1 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 2 2 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 2 3 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 2 4 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 2 5 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 2 6 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 2 7 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 2 8 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 2 9 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 2 10 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 3 1 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 3 2 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 3 3 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 3 4 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 3 5 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 3 6 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 3 7 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 3 8 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 3 9 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 3 10 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 4 1 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 4 2 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 4 3 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 4 4 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 4 5 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 4 6 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 4 7 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 4 8 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 4 9 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 4 10 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 5 1 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 5 2 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 5 3 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 5 4 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 5 5 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 5 6 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 5 7 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 5 8 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 5 9 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember 5 10 indsocial indspend dem_edugroup dem_agegrp libcpre_self inc_incgroup_pre relig_chmember > mod.mids2 <- lm.mids(as.numeric(libcpre_self) ~ indsocial + indspend + + as.numeric(dem_agegrp) + gender_respondent + dem_edugroup + inc_incgroup_pre + + relig_chmember, data=mice.out2) > summary(pool(mod.mids2)) est se t df (Intercept) indsocial indspend as.numeric(dem_agegrp) gender_respondent dem_edugroup dem_edugroup dem_edugroup dem_edugroup inc_incgroup_pre

7 relig_chmember Pr(> t ) lo 95 hi 95 nmis fmi (Intercept) e NA indsocial e indspend e as.numeric(dem_agegrp) e NA gender_respondent e dem_edugroup e NA dem_edugroup e NA dem_edugroup e NA dem_edugroup e NA inc_incgroup_pre e relig_chmember e NA lambda (Intercept) indsocial indspend as.numeric(dem_agegrp) gender_respondent dem_edugroup dem_edugroup dem_edugroup dem_edugroup inc_incgroup_pre relig_chmember Consider two situations - where non-responders are one standard deviation more liberal than responders and when they are one standard deviation more conservative than non-responders. Do your results change at all? > library(sensmiceda) > lrvals <- c(-1,1)*sd(dat$libcpre_self, na.rm=t) > out <- sens.est(mice.out, list(libcpre_self = lrvals)) libcpre_self pmm Summary : Variable Method SupPar [1,] "libcpre_self" "pmm" " " libcpre_self pmm Summary : Variable Method SupPar [1,] "libcpre_self" "pmm" " " > pool.dat <- sens.pool(mod.lm, out, mice.out) Multiple imputation results: MIcombine.default(X[[i]],...) results se (lower upper) missinfo (Intercept) % indsocial % indspend % dem_agegrp % gender_respondent % dem_edugroup % dem_edugroup % dem_edugroup % dem_edugroup % inc_incgroup_pre % relig_chmember % Multiple imputation results: MIcombine.default(X[[i]],...) results se (lower upper) missinfo (Intercept) % indsocial % indspend % dem_agegrp % gender_respondent % dem_edugroup % dem_edugroup % dem_edugroup % dem_edugroup % inc_incgroup_pre % relig_chmember % Multiple imputation results: MIcombine.default(X[[i]],...) results se (lower upper) missinfo (Intercept) % indsocial % indspend %

8 dem_agegrp % gender_respondent % dem_edugroup % dem_edugroup % dem_edugroup % dem_edugroup % inc_incgroup_pre % relig_chmember % > plot(pool.dat) libcpre_self: 1.47 libcpre_self: 1.47 mice Coefficients with 95% Confidence Intervals indsocial dem_edugroup4 (Intercept) indspend dem_edugroup5 dem_agegrp relig_chmember2 gender_respondent dem_edugroup inc_incgroup_pre dem_edugroup3 libcpre_self: 1.47 libcpre_self: 1.47 mice libcpre_self: 1.47 libcpre_self: 1.47 mice 8

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018

Lab #13 - Resampling Methods Econ 224 October 23rd, 2018 Introduction In this lab you will work through Section 5.3 of ISL and record your code and results in an RMarkdown document. I have added section